[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087681#comment-15087681
 ] 

Hudson commented on HBASE-14468:


FAILURE: Integrated in HBase-0.98-matrix #283 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/283/])
HBASE-14468 Compaction improvements: FIFO compaction policy. (Vladimir (larsh: 
rev 912b42786fbb1374f42648aceaa813ab009e3a9b)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.17
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> From HBase shell:
> {code}
> create 'x',{NAME=>'y', TTL=>'30'}, {CONFIGURATION => 
> {'hbase.hstore.defaultengine.compactionpolicy.class' => 
> 'org.apache.hadoop.hbase.regionserver.compactions.FIFOCompactionPolicy', 
> 'hbase.hstore.blockingStoreFiles' => 1000}}
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well (there is a sanity check on 
> table/column family configuration in case of FIFO compaction and minimum 
> value for number of blocking file is 1000).
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086059#comment-15086059
 ] 

Lars Hofhansl commented on HBASE-14468:
---

bq. HBASE-10141. Not sure if it is in 0.98.

That's in 0.98 via HBASE-12144

[~vrodionov], any change to try the scenario I mentioned above?
I.e. (1) define a TTL (perhaps a few minutes), (2) do a bunch of flushed within 
the TTL, (3) then stop all writes, (4) see if all prior store files are 
eventually collected without any write activity.

I might get time to test a bit more in 0.98, until then I'm gonna hold off 
committing to 0.98.


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-06 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086102#comment-15086102
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
Vladimir Rodionov, any change to try the scenario I mentioned above?
{quote}

No writes, no flushes, no compactions. This is expected behaviour. All expired 
files will be purged when next flush happens (maximum in 1h, when periodic 
memstore flusher kicks in). I do not see any issues here, [~lhofhansl].

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086210#comment-15086210
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Hmm... We have the CompactionChecker in the RS, which wakes up every 10s. But I 
notice it actually does work only every 1000's iteration by default.

Sorry for the false alarm.

The code does look a bit funky (HRegionServer.CompactionChecker.chore()):
If I read this right, then all stores that need compaction will trigger a 
compaction on the very same iteration (assuming all have the same default 
compaction multiplier). How that not lead to compaction storms? But that is 
unrelated to this.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-06 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086198#comment-15086198
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Verified test with short MEMSTORE_PERIODIC_FLUSH_INTERVAL
{code}
conf.setInt(HRegion.MEMSTORE_PERIODIC_FLUSH_INTERVAL, 6);
{code}

All expired stores have been purged after first periodic flush.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083558#comment-15083558
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Looking into this.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083584#comment-15083584
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

The bug [~lhofhansl] found does not allow setting FIFO compaction policy per 
column family. I reopened JIRA and added small fix. 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.addendum
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083615#comment-15083615
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~lhofhansl]
{quote}
Lastly, does this require HBASE-14467 as well?
{quote}

I am pretty sure it does not. HBASE-14467 should be marked as invalid. Will 
double check it.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.addendum
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083657#comment-15083657
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~lhofhansl]

I think you should backport HBASE-10141 to 0.98 as well? I am going to mark 
HBASE-14467 as invalid.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084618#comment-15084618
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
Is there something else besides this patch I need that is 1.2+ but not in 0.98?
{quote}

HBASE-10141. Not sure if it is in 0.98.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084593#comment-15084593
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.3-IT #421 (See 
[https://builds.apache.org/job/HBase-1.3-IT/421/])
HBASE-14468 addendum. (larsh: rev f4a66fc083a2c11e8de0ea283f45c5eeb0bac93e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084938#comment-15084938
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.2-IT #379 (See 
[https://builds.apache.org/job/HBase-1.2-IT/379/])
HBASE-14468 addendum. (larsh: rev 883e3cdc34d29d81326b84220460016c38db7c6a)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084931#comment-15084931
 ] 

Hudson commented on HBASE-14468:


FAILURE: Integrated in HBase-Trunk_matrix #612 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/612/])
HBASE-14468 addendum. (larsh: rev e8fbc9b43a3742358e0bdfe441ff4ca9d14e127b)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085018#comment-15085018
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.2 #490 (See 
[https://builds.apache.org/job/HBase-1.2/490/])
HBASE-14468 addendum. (larsh: rev 883e3cdc34d29d81326b84220460016c38db7c6a)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083931#comment-15083931
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
Sorry for the spam here. I tested the following scenario:
compaction threshold set to 3
TTL 30s
flushed 5 files to disk within 30s.
Shouldn't I expect that after a while all files should be removed? I find them 
still hanging around after 1h.
{quote}

I do not see this on master branch. Under constant write load, all expired  
store files get purged and archived continuously.   

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083967#comment-15083967
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12780482/14468-0.98-v2.txt
  against 0.98 branch at commit 1c4edd2ab702488e21c4929a998c49a4208633fc.
  ATTACHMENT ID: 12780482

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
28 warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not generate new 
checkstyle errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 site{color}.  The mvn post-site goal succeeds with this 
patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 zombies{color}. No zombie tests found running at the end of 
the build.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17135//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17135//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17135//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17135//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17135//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084001#comment-15084001
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12780609/HBASE-14468.add.patch
  against master branch at commit 1c4edd2ab702488e21c4929a998c49a4208633fc.
  ATTACHMENT ID: 12780609

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not generate new 
checkstyle errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 site{color}.  The mvn post-site goal succeeds with this 
patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 zombies{color}. No zombie tests found running at the end of 
the build.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17136//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17136//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17136//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/17136//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084015#comment-15084015
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Addendum is ready to be committed (2.0, 1.2, 1.3).

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084059#comment-15084059
 ] 

Lars Hofhansl commented on HBASE-14468:
---

That's different. I see that as I flush at the time of flushing we do 
compaction checks and remove the old files.
But if I write a bunch in 30s (I flush manually), and then stop all activity 
the files are not purged.

With longer TTLs of a few days or weeks that may significant.

I tested this is the shell with:
{code}
put 'x', 'r1', 'y:1', 1; flush 'x'
{code}
And just changing 'r1' to 'r2', 'r3', etc, each time. That way I created 6 
store files in 30s, and they never get collected, even though their TTL clearly 
expires. (compaction threshold is still 3). With the default collector they do 
get collected after a minute or so.

Is there something else besides this patch I need that is 1.2+ but not in 0.98?


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084061#comment-15084061
 ] 

Lars Hofhansl commented on HBASE-14468:
---

+1 on addendum.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084099#comment-15084099
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Pushed addendum to 1.2, 1.3, and 2.0

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15084541#comment-15084541
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.3 #481 (See 
[https://builds.apache.org/job/HBase-1.3/481/])
HBASE-14468 addendum. (larsh: rev f4a66fc083a2c11e8de0ea283f45c5eeb0bac93e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch, 
> HBASE-14468.add.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082378#comment-15082378
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Yeah... Will commit today or early tomorrow.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082200#comment-15082200
 ] 

Andrew Purtell commented on HBASE-14468:


bq. here's a 0.98 patch

+1
Want to commit it?


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082514#comment-15082514
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Lastly, does this require HBASE-14467 as well?

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082522#comment-15082522
 ] 

Lars Hofhansl commented on HBASE-14468:
---

One more: Should we recommend increasing hbase.hstore.compactionThreshold as 
well (default it 3)

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082473#comment-15082473
 ] 

stack commented on HBASE-14468:
---

Yeah. It looks wrong (And it should be if (!  and drop that == false).

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082566#comment-15082566
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Sorry for the spam here. I tested the following scenario:
* compaction threshold set to 3
* TTL 30s
* flushed 5 files to disk within 30s.

Shouldn't I expect that after a while all files should be removed? I find them 
still hanging around after 1h.


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98-v2.txt, 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082479#comment-15082479
 ] 

Lars Hofhansl commented on HBASE-14468:
---

And (cosmetic) majorCompactionPeriod and splitPolicyClassName are not used 
anywhere.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082532#comment-15082532
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Example shell command:
{code}
create 'x',{NAME=>'y', TTL=>'30'}, {CONFIGURATION => 
{'hbase.hstore.defaultengine.compactionpolicy.class' => 
'org.apache.hadoop.hbase.regionserver.compactions.FIFOCompactionPolicy', 
'hbase.hstore.blockingStoreFiles' => 1000}}
{code}


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2016-01-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082464#comment-15082464
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Actually there's a bug in HMaster.checkCompactionPolicy(...):
{code}
+for (HColumnDescriptor hcd : htd.getColumnFamilies()) {
+  String compactionPolicy =
+  hcd.getConfigurationValue(DefaultStoreEngine.DEFAULT_COMPACTION_POLIC
Y_CLASS_KEY);
+  if (compactionPolicy == null) {
+compactionPolicy = className;
+  }
+  if (className.equals(FIFOCompactionPolicy.class.getName()) == false) {   
 <---
+continue;
{code}

The indicated line should be:
{code}
+  if (compactionPolicy.equals(FIFOCompactionPolicy.class.getName()) == 
false) {
{code}
No? (this is wrong in all branches)


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075680#comment-15075680
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
 here's a 0.98 patch
{quote}

Thanks, [~lhofhansl]

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14468-0.98.txt, HBASE-14468-v1.patch, 
> HBASE-14468-v10.patch, HBASE-14468-v2.patch, HBASE-14468-v3.patch, 
> HBASE-14468-v4.patch, HBASE-14468-v5.patch, HBASE-14468-v6.patch, 
> HBASE-14468-v7.patch, HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074732#comment-15074732
 ] 

Lars Hofhansl commented on HBASE-14468:
---

This is almost strictly add-on. We can backport to 0.98 I think. [~apurtell].

(We should also test whether this allows HBase to handle usecases that are 
typically handled by Kafka.)


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, Performance
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057000#comment-15057000
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

OK, [~enis]. I will open JIRA.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056979#comment-15056979
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
Seems ChoreService.shutdown() should be synchronized. Open new issue?
{quote}

Not sure. This is called in HRegionServer.stopServiceThreads and is not 
supposed to be MT-safe. Unless we stop the same mini-cluster from multiple 
threads?

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057022#comment-15057022
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

HBASE-14977.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056992#comment-15056992
 ] 

Enis Soztutar commented on HBASE-14468:
---

I think the issue is that the HashMaps inside ChoreService is not thread-safe. 
All usages except for shutdown() is guarded by synchronized. Even one thread 
calling shutdown(), while some other threads trying to access the same hashmaps 
will throw {{ConcurrentModificationException}}. 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056833#comment-15056833
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Another observation: 

It fails under 1.8. My version of JDK is 1.7-latest

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056829#comment-15056829
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~saint@gmail.com]

I see this exception during HBase mini cluster shutdown.

{code}
2015-12-09 17:51:28,444 ERROR [RS:0;asf901:37225] 
hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer(145): Exception in run
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
at java.util.AbstractCollection.toString(AbstractCollection.java:461)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.hadoop.hbase.ChoreService.shutdown(ChoreService.java:323)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2127)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1084)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at 
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334)
at 
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
at java.lang.Thread.run(Thread.java:745)
{code}

This seems has nothing to do with a test in question. I can not reproduce this 
issue in my local environment. Any suggestions?


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056893#comment-15056893
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Run test multiple times under 1.8_65. No issues. Is this issue reproducible in 
apache build system?

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-14 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056929#comment-15056929
 ] 

Enis Soztutar commented on HBASE-14468:
---

Seems {{ChoreService.shutdown()}} should be {{synchronized}}. Open new issue? 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052188#comment-15052188
 ] 

stack commented on HBASE-14468:
---

Any chance to look at above? Thanks.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-12-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049310#comment-15049310
 ] 

stack commented on HBASE-14468:
---

This test hung just now in a 1.3 build:



kalashnikov:hbase.git stack$ !520
python ./dev-support/findHangingTests.py 
https://builds.apache.org/job/HBase-1.3/jdk=latest1.8,label=Hadoop/425/consoleText
Fetching 
https://builds.apache.org/job/HBase-1.3/jdk=latest1.8,label=Hadoop/425/consoleText
Building remotely on H1 (Mapreduce Hadoop Pig Hdfs) in workspace 
/home/jenkins/jenkins-slave/workspace/HBase-1.3/jdk/latest1.8/label/Hadoop
Printing hanging tests
Hanging test : 
org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy

It looks like it is stuck waiting on a server to show up.


https://builds.apache.org/job/HBase-1.3/jdk=latest1.8,label=Hadoop/425/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy-output.txt


Please take a look see when you get a chance. Thanks.





> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013118#comment-15013118
 ] 

Hudson commented on HBASE-14468:


FAILURE: Integrated in HBase-Trunk_matrix #480 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/480/])
HBASE-14468 Compaction improvements: FIFO compaction policy (Vladimir (enis: 
rev cf81b45f3771002146d6e8c4d995b12963aa685a)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-19 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014251#comment-15014251
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Created separate documentation JIRA: HBASE-14847.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> These are some applications which could benefit the most:
> # Use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # Use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Although region splitting is supported,  for optimal performance it should be 
> disabled, either by setting explicitly DisabledRegionSplitPolicy or by 
> setting ConstantSizeRegionSplitPolicy and very large max region size. You 
> will have to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012960#comment-15012960
 ] 

Hudson commented on HBASE-14468:


FAILURE: Integrated in HBase-1.2 #384 (See 
[https://builds.apache.org/job/HBase-1.2/384/])
HBASE-14468 Compaction improvements: FIFO compaction policy (Vladimir (enis: 
rev a403429952c2bd3d0e951a3382d5708e7c8affee)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013022#comment-15013022
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.3-IT #322 (See 
[https://builds.apache.org/job/HBase-1.3-IT/322/])
HBASE-14468 Compaction improvements: FIFO compaction policy (Vladimir (enis: 
rev 8a69dd5b0898a92bd86d75d7535660322e27cf8e)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013028#comment-15013028
 ] 

Hudson commented on HBASE-14468:


FAILURE: Integrated in HBase-1.3 #379 (See 
[https://builds.apache.org/job/HBase-1.3/379/])
HBASE-14468 Compaction improvements: FIFO compaction policy (Vladimir (enis: 
rev 8a69dd5b0898a92bd86d75d7535660322e27cf8e)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013092#comment-15013092
 ] 

Hudson commented on HBASE-14468:


SUCCESS: Integrated in HBase-1.2-IT #292 (See 
[https://builds.apache.org/job/HBase-1.2-IT/292/])
HBASE-14468 Compaction improvements: FIFO compaction policy (Vladimir (enis: 
rev a403429952c2bd3d0e951a3382d5708e7c8affee)
* 
hbase-common/src/test/java/org/apache/hadoop/hbase/util/TimeOffsetEnvironmentEdge.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/compactions/TestFIFOCompactionPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/FIFOCompactionPolicy.java


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010064#comment-15010064
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12772869/HBASE-14468-v10.patch
  against master branch at commit d6fdf92f9e5f4eaaf9300dce7f1f23adf228949c.
  ATTACHMENT ID: 12772869

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16559//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16559//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16559//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16559//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v10.patch, 
> HBASE-14468-v2.patch, HBASE-14468-v3.patch, HBASE-14468-v4.patch, 
> HBASE-14468-v5.patch, HBASE-14468-v6.patch, HBASE-14468-v7.patch, 
> HBASE-14468-v8.patch, HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-09 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997931#comment-14997931
 ] 

Enis Soztutar commented on HBASE-14468:
---

Thanks for updating the patch. The changes look good. Only a few comments: 

 - TimeOffsetEnvironmentEdge can be moved to src/test rather than src/main. 
 - testPurgeExpiredFiles() does not assert anything, just prints.
 - nit: you can use fail() in Assert class for this: 
{code}
+  assertTrue(false);
{code}

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch, HBASE-14468-v7.patch, HBASE-14468-v8.patch, 
> HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986255#comment-14986255
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12770173/HBASE-14468-v8.patch
  against master branch at commit f0176d942af26c8423d534d6b806b83817dd98e0.
  ATTACHMENT ID: 12770173

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestInterfaceAudienceAnnotations

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16354//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16354//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16354//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16354//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch, HBASE-14468-v7.patch, HBASE-14468-v8.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-11-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986560#comment-14986560
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12770193/HBASE-14468-v9.patch
  against master branch at commit f0176d942af26c8423d534d6b806b83817dd98e0.
  ATTACHMENT ID: 12770193

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16355//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16355//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16355//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16355//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch, HBASE-14468-v7.patch, HBASE-14468-v8.patch, 
> HBASE-14468-v9.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981136#comment-14981136
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12769572/HBASE-14468-v7.patch
  against master branch at commit 2288742c10e04d46212dbf70b931e460214992bf.
  ATTACHMENT ID: 12769572

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1733 checkstyle errors (more than the master's current 1732 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestRegionMover

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16284//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16284//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16284//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16284//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch, HBASE-14468-v7.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-28 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979161#comment-14979161
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~enis]

* sanityCheck is OK, will do that
* Auto-disable the major compactions, and set the blocking store files if they 
are not set? - OK
* Allow splits? Not sure. Will think about this.

{quote}
Can we use HStore.removeUnneededFiles() or storeEngine.getStoreFileManager() 
which already implements the is expired logic so that there is no duplication 
there?
{quote}

What duplication? FCP does not expire /purge files, HStore takes care of them.
 


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-27 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977632#comment-14977632
 ] 

Enis Soztutar commented on HBASE-14468:
---

This is a good idea. We should add this to the list of compaction policies with 
good documentation. We have use cases where there is a TTL of a couple of days. 
Metrics store is one such example for the raw data in a high ingest scenario. 

For the patch itself, the first if is not needed if we are checking for the 
DisabledRSP anyway: 
{code}
+
if(splitPolicyClassName.equals(IncreasingToUpperBoundRegionSplitPolicy.class.getName())){
+  throw new RuntimeException("Default split policy for FIFO compaction"+
+  " is not supported, aborting.");
+} else if( 
!splitPolicyClassName.equals(DisabledRegionSplitPolicy.class.getName())){
+  warn.append(":region splits must be disabled:");
+} 
{code}

Can we make it so that if a split happens we still compact the reference files, 
but we do not compact otherwise? We can also allow very-slow splits in the case 
where the reference files will be cleaned out due to TTL. In this case, a 
region can still split every TTL interval. 

RuntimeException's thrown will cause region opening to fail or RS to abort? Can 
we hook the verify code to {{HMaster.sanityCheckTableDescriptor()}}, so that 
you cannot alter the table or create a table with those settings. This will 
make a much better experience for the user. 

Can we also simplify the configuration for these. Maybe we auto-disable the 
major compactions, and set the blocking store files if they are not set? 

Can we use HStore.removeUnneededFiles() or 
{{storeEngine.getStoreFileManager()}} which already implements the is expired 
logic so that there is no duplication there? 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-23 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972017#comment-14972017
 ] 

Devaraj Das commented on HBASE-14468:
-

[~vrodionov] the test still relies on real clocks.. It sleeps to make sure that 
the TTL is expired, etc. What I meant to say earlier was that the test could 
use an implementation of the clock that fakes the time (the implementation of 
currentTimeMillis could be based on a monotonically increasing number and every 
invocation could return a new time...).

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970117#comment-14970117
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12768121/HBASE-14468-v6.patch
  against master branch at commit 467bc098a9512afca38356da56d92c351f15b042.
  ATTACHMENT ID: 12768121

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1734 checkstyle errors (more than the master's current 1733 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn post-site goal 
to fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16181//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16181//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16181//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16181//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch, 
> HBASE-14468-v6.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-16 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961618#comment-14961618
 ] 

Devaraj Das commented on HBASE-14468:
-

Also consider usage of EnvironmentEdgeManager instead of 
System.currentTimeMillis.. Your UT can use that as well and run without 
depending on real clocks... 

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-16 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961608#comment-14961608
 ] 

Devaraj Das commented on HBASE-14468:
-

Looks okay to me. The policy will address some use cases and it might be 
problematic for others. We should clearly call out the cases where this 
should/should-not be used (the limitations in the description is a good one).
On the code itself, you should have some timeout on the tests. You should log 
some warning when hasExpiredStores/getExpiredStores returns nothing after a 
couple of invocations, and yet the number of storefiles are growing (that would 
indicate the data is not conforming to the expectations from the compaction 
policy).

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944362#comment-14944362
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12765070/HBASE-14468-v5.patch
  against master branch at commit ceafa09d3cf6102d21c66745ca80e132021890c9.
  ATTACHMENT ID: 12765070

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestCopyTable.testRenameFamily(TestCopyTable.java:216)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15879//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15879//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15879//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15879//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-05 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944100#comment-14944100
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

In my local perf tests on my laptop I can easily exceed 100MB/s sustained write 
speed for single threaded test application.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch, HBASE-14468-v5.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> Make sure, that table has disabled region splits (either by setting 
> explicitly DisabledRegionSplitPolicy or by setting 
> ConstantSizeRegionSplitPolicy and very large max region size). You will have 
> to increase to a very large number store's blocking file number : 
> *hbase.hstore.blockingStoreFiles* as well.
>  
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-04 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942782#comment-14942782
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

{quote}
Yes, it looks like we can achieve FIFO behavior by using existing 
ExlporingCompactionPolicy. We have to set CF TTL, disable periodic major 
compactions and set minimum files to compact to very large value. But even if 
it works, I would prefer to use separate policy - it is self explaining, at 
least
{quote}

No, we can't, because ExlporingCompactionPolicy always checks if # of store 
files is greater than minimum number of files to compact and if it less than, 
than no compaction is requested. Therefore we can't increase minimum files to 
compact to very large value and we need separate compaction policy for FIFO 
style of compaction.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940816#comment-14940816
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

HBASE-14477 - now its DateTieredCompaction :)

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939913#comment-14939913
 ] 

Ted Yu commented on HBASE-14468:


I think support for purging expired files is already in code base.
If TTL is short, user can achieve FIFO compaction by adjusting existing 
compaction parameters.

During scan, timerange can be specified so that expired HFiles are excluded.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-01 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940214#comment-14940214
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~tedyu] 
{quote}
 think support for purging expired files is already in code base.
If TTL is short, user can achieve FIFO compaction by adjusting existing 
compaction parameters.
{quote}

Yes, it looks like we can achieve FIFO behavior by using existing 
ExlporingCompactionPolicy. We have to set CF TTL, disable periodic major 
compactions and set minimum files to compact to very large value. But even if 
it works, I would prefer to use separate policy - it is self explaining, at 
least.  

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940769#comment-14940769
 ] 

Lars Hofhansl commented on HBASE-14468:
---

HBASE-14677 doesn't exist, though :)

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939307#comment-14939307
 ] 

Lars Hofhansl commented on HBASE-14468:
---

The other thing I want to do is tired compactions along timerange. I.e. instead 
having a (major) compaction spit out a single file, we can a configurable 
number based on timebands. Then we can (say) query the last weeks worth of data 
without touched many of the older files. But that's a different topic.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939217#comment-14939217
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764516/HBASE-14468-v4.patch
  against master branch at commit a463984945717bf9cb2881c3d586d5b11d192d65.
  ATTACHMENT ID: 12764516

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1787 checkstyle errors (more than the master's current 1781 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 3 zombie test(s):   
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot.testSnapshot(TestSnapshot.java:236)
at 
org.apache.hadoop.hbase.util.TestCoprocessorScanPolicy.testTTL(TestCoprocessorScanPolicy.java:157)
at 
org.apache.hadoop.hbase.client.TestFromClientSide.testUnmanagedHConnectionReconnect(TestFromClientSide.java:4081)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15837//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15837//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15837//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15837//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * 

[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-30 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939344#comment-14939344
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

There is a JIRA for that - HBASE-14677, [~lhofhansl] :)

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934776#comment-14934776
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Haven't looked at the patch, but the idea sounds great!
Does that mean we will essentially stay at memstore sized HFiles until they are 
collected?


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. One of the use cases for this policy 
> is when we need to store raw data which will be post-processed later and 
> discarded completely after quite short period of time. Raw time-series vs. 
> time-based rollup aggregates and compacted time-series. We collect raw 
> time-series and store them into CF with FIFO compaction policy, periodically 
> we run  task which creates rollup aggregates and compacts time-series, the 
> original raw data can be discarded after that.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935434#comment-14935434
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

Yes, FIFO compactor does only one job: collects expired store files. I see many 
applications for this policy:

# use it for very high volume raw data which has low TTL and which is the 
source of another data (after additional processing). Example: raw event stream 
(FIFO compaction) - compacted event stream (regular compaction)
# use it for data which can be kept entirely in a a block cache (RAM/SSD). Say 
we have local SSD (1TB) which we can use as a block cache. No need for 
compaction of a raw data at all.


   

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. One of the use cases for this policy 
> is when we need to store raw data which will be post-processed later and 
> discarded completely after quite short period of time. Raw time-series vs. 
> time-based rollup aggregates and compacted time-series. We collect raw 
> time-series and store them into CF with FIFO compaction policy, periodically 
> we run  task which creates rollup aggregates and compacts time-series, the 
> original raw data can be discarded after that.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935830#comment-14935830
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Thanks for background. We have some use cases with TTL, but the TTL is measured 
in months or years, wonder if we can combine this with other compactor and/or 
have a policy that under some conditions compacts anyway, even when there are 
unexpired rows.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935917#comment-14935917
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764291/HBASE-14468-v3.patch
  against master branch at commit 37877e3f56b038c0821138862813e567390a9ff4.
  ATTACHMENT ID: 12764291

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.util.TestCoprocessorScanPolicy.testTTL(TestCoprocessorScanPolicy.java:157)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15810//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15810//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15810//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15810//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935982#comment-14935982
 ] 

Vladimir Rodionov commented on HBASE-14468:
---

[~lhofhansl], I do not think FIFO is a good policy for TTL which is longer than 
couple days, unless you guarantee that all your data fits block cache and stays 
there permanently.
I will update standard ExploringCompactionPolicy implementation and will add 
support for purging expired files.


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909946#comment-14909946
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12762618/HBASE-14468-v1.patch
  against master branch at commit 526520de0a9d7a29fcf1b4c521f017ca75a46cbc.
  ATTACHMENT ID: 12762618

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1789 checkstyle errors (more than the master's current 1787 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+LOG.info("[FIFOCompactionPolicy] Selected: "+ toCompact.size()+" 
asked: " + candidateFiles.size());
+  public boolean needsCompaction(Collection storeFiles, 
List filesCompacting) {

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mob.compactions.TestMobCompactor

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.hbase.TestPartialResultsFromClientSide.testReadPointAndPartialResults(TestPartialResultsFromClientSide.java:777)
at 
org.apache.hadoop.hbase.util.TestCoprocessorScanPolicy.testTTL(TestCoprocessorScanPolicy.java:157)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15779//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15779//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15779//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15779//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch
>
>
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. One of the use cases for this policy 
> is when we need to store raw data which will be post-processed later and 
> discarded completely after quite short period of time. Raw time-series vs. 
> time-based rollup aggregates and compacted time-series. We collect raw 
> time-series and store them into CF with FIFO compaction policy, periodically 
> we run  task which creates rollup aggregates and compacts time-series, the 
> original raw data can be discarded after that.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910063#comment-14910063
 ] 

Hadoop QA commented on HBASE-14468:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12762631/HBASE-14468-v2.patch
  against master branch at commit 526520de0a9d7a29fcf1b4c521f017ca75a46cbc.
  ATTACHMENT ID: 12762631

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileScanning(TestHRegion.java:3890)
at 
org.apache.hadoop.hbase.util.TestCoprocessorScanPolicy.testTTL(TestCoprocessorScanPolicy.java:157)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15781//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15781//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15781//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15781//console

This message is automatically generated.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. One of the use cases for this policy 
> is when we need to store raw data which will be post-processed later and 
> discarded completely after quite short period of time. Raw time-series vs. 
> time-based rollup aggregates and compacted time-series. We collect raw 
> time-series and store them into CF with FIFO compaction policy, periodically 
> we run  task which creates rollup aggregates and compacts time-series, the 
> original raw data can be discarded after that.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)