subject:"\[jira\] \[Commented\] \(HBASE\-15181\) A simple implementation of date based tiered compaction"

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-04-28 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989087#comment-15989087
 ] 

Dave Latham commented on HBASE-15181:
-

[~tychang] it looks like there's now work going on to help opentsdb take 
advantage of this
https://github.com/OpenTSDB/opentsdb/pull/971

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-20 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932909#comment-15932909
 ] 

Dave Latham commented on HBASE-15181:
-

I don't know enough about the details of the data schema that opentsdb to say 
if it will benefit.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-20 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932869#comment-15932869
 ] 

Tianying Chang commented on HBASE-15181:


[~enis] that is a great point. [~davelatham] I am wondering will opentsdb 
benefit from it thought since I assume it set startRow/stoprow with start/stop 
time encoded in it. That should have already excluded all unnecessary data?   

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-16 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928898#comment-15928898
 ] 

Dave Latham commented on HBASE-15181:
-

Indeed that's true (and we do the same), but the application needs to be 
intelligent enough to use it.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-16 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928887#comment-15928887
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. To take good advantage of it, the queries would need to set the time range 
on the scan itself (as opposed to purely encoding time information directly in 
the row/column identifiers and relying on them for time bounding). I'm not sure 
if opentsdb does that.
Excellent point. However, even if the actual time is embedded in the row/column 
model, an engine like opentsdb might be able to take advantage of this 
compaction strategy. The idea is that there should be a time bound error (lets 
call if {{E}}, where it is assumed that all data belonging to time {{T1}} has 
to be persisted. Then for queries for time ranges {{T1}} to {{T2}}, the engine 
can also set the time range on the scan object using {{T1-E}} to {{T2+E}}. This 
will provide both correctness (since the engine will still do filtering on the 
incoming data using T1 and T2, but the hbase scan will ignore data not in the 
range. Ambari metrics server does something like this. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-16 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928560#comment-15928560
 ] 

Tianying Chang commented on HBASE-15181:


[~davelatham] Good point about the set time range vs encoding the time 
information into the row. By opentsdb schema design, my gut feeling is it is 
not using set time range, unfortunately. I will verify it. 

If it is not using set time range, I guess since our opentsdb usage configured 
to only keep 28 days with TTL set to 28, one benefit is we can utilize is it 
can drop the whole store files with the expired TTL.  

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-15 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927057#comment-15927057
 ] 

Dave Latham commented on HBASE-15181:
-

It's been awhile since I looked at opentsdb's write/read patterns, but I do 
think date tiered compaction would be a great fit for time series data, 
especially if the engine is aware.  To take good advantage of it, the queries 
would need to set the time range on the scan itself (as opposed to purely 
encoding time information directly in the row/column identifiers and relying on 
them for time bounding).  I'm not sure if opentsdb does that.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-15 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927035#comment-15927035
 ] 

Tianying Chang commented on HBASE-15181:


[~davelatham] Thanks a lot for the information. Will look through the subtasks 
on HBASE-15339 also. Will give it a try to apply those patches to 1.2. My 
feeling is this kind of compaction algorithm should be very suitable for 
opentsdb use case, do you know if anyone had concrete experience on the 
improvement from this compaction algorithm on opentsdb? 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-14 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924910#comment-15924910
 ] 

Dave Latham commented on HBASE-15181:
-

The branch-1 patch likely would have applied to 1.2 as well when it was 
developed, but since 1.2.x patch releases should only have bug fixes, not new 
features like this it wasn't applied there.  I don't know if branch-1.2 has 
changed so that it would not apply.  If you do try, I would recommend also 
picking up the follow on work in subtasks of HBASE-15339

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2017-03-14 Thread Tianying Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924776#comment-15924776
 ] 

Tianying Chang commented on HBASE-15181:


[~clarax98007] Do you have patch for 1.2 also?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-06-29 Thread Mikhail Antonov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355819#comment-15355819
 ] 

Mikhail Antonov commented on HBASE-15181:
-

Sorry, wrong jira. was meant for HBASE-15454

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-06-29 Thread Mikhail Antonov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355816#comment-15355816
 ] 

Mikhail Antonov commented on HBASE-15181:
-

Ping. Since 1.3 was postponed several times due to various issue, curious if 
there's any update on that?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-04-15 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243150#comment-15243150
 ] 

Nick Dimiduk commented on HBASE-15181:
--

You folks see HBASE-15659 ? Maybe there's some metrics you'd want to expose for 
this one as well?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-14 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193566#comment-15193566
 ] 

ramkrishna.s.vasudevan commented on HBASE-15181:


Thank you for the info. That was useful.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-10 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189601#comment-15189601
 ] 

Clara Xiong commented on HBASE-15181:
-

There are a few tweaks made in this patch from Cassandra's design:
1. Use maxTimestamp instead of minTimestamp to favor late arriving data.
2. Plug in a compaction policy (default to exploring compaction)per window to 
reduce wasteful compaction.
3. Normalize timestamp to sequence id to guarantee contiguous compaction by seq 
id for correctness while paying some performance penalty for out-of-order data.

The subsequent work HBASE-15400 so far is to deal with major compaction and 
very large bulk load files so we can still maintain a date tiered layer. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-10 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189575#comment-15189575
 ] 

ramkrishna.s.vasudevan commented on HBASE-15181:


Just interested in knowing this feature. I had a quick look at the code. 
If you don't mind - a very naive question - what is the difference between the 
Date Tiered compaction in CAssandra and this one in HBase?  I can see 
subsequent JIRAs being actively pursued here but just wanted to get a note on 
that. I am trying to understand things from the design doc attached here but 
still wanted to get an idea. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176373#comment-15176373
 ] 

Clara Xiong commented on HBASE-15181:
-

Results from our production is added at 
https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing
> Results in our production is at 
> https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176320#comment-15176320
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

[~claraxiong]
{quote}
When it doesn't perform as well:
random gets without a time range
{quote}

I think we should educate users how to read data efficiently in case of 
multiple store files in a region:
*io.storefile.bloom.error.rate* must be decreased from default 0.01 to 0.001 or 
less (I would recommend 0.0001). This will reduce false positives in rowkey 
lookups accordingly but will increase space occupied by bloom filter in memory 
(of course).  

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176080#comment-15176080
 ] 

Hudson commented on HBASE-15181:


ABORTED: Integrated in HBase-0.98-matrix #305 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/305/])
HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 
41c04ee685b07321efe570fa91416ba90f8eeaa9)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176024#comment-15176024
 ] 

Hudson commented on HBASE-15181:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1179 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1179/])
HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 
41c04ee685b07321efe570fa91416ba90f8eeaa9)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175677#comment-15175677
 ] 

Clara Xiong commented on HBASE-15181:
-

Can this be added as release note?

Date tiered compaction policy is a date-aware store file layout that is 
beneficial for time-range scans for time-series data.

When it performs well:
- reads for limited time ranges, especially scans of recent data

When it doesn't perform as well:
- random gets without a time range
- frequent deletes and updates
- out of order data writes, especially writes with timestamps in the future
- bulk loads of historical data

Recommended configuration:
To turn on Date Tiered Compaction:
hbase.hstore.compaction.compaction.policy: 
org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy 

Parameters for Date Tiered Compaction:
hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with 
max-timestamp smaller than this will no longer be compacted.Default at 
Long.MAX_VALUE.
hbase.hstore.compaction.date.tiered.base.window.millis: base window size in 
milliseconds. Default at 6 hours.
hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per 
tier. Default at 4.
hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of 
files to compact in the incoming window. Set it to expected number of files in 
the window to avoid wasteful compaction. Default at 6.
hbase.hstore.compaction.date.tiered.window.policy.class: the policy to select 
store files within the same time window. It doesn’t apply to the incoming 
window. Default at exploring compaction. This is to avoid wasteful compaction.

With tiered compaction all servers in the cluster will promote windows to 
higher tier at the same time, so using a compaction throttle is recommended:
hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController

Because there will most likely be more store files around, we need to adjust 
the configuration so that flush won't be blocked and compaction will be 
properly throttled:
hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters 
when turning on date tiered compaction. Use 1.5~2 x projected file count if 
changing the parameters, Projected file count = windows per tier x tier count + 
incoming window min + files older than max age

For more details, please refer to the design spec at 
https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit#



> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-02 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175628#comment-15175628
 ] 

Jean-Marc Spaggiari commented on HBASE-15181:
-

Will it be possible to add a release not with a description on how to enable 
that, etc.?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175207#comment-15175207
 ] 

Hudson commented on HBASE-15181:


FAILURE: Integrated in HBase-1.3 #583 (See 
[https://builds.apache.org/job/HBase-1.3/583/])
HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 
231a5807b4277020124ede1a9b44932e1048dfb6)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175114#comment-15175114
 ] 

Hudson commented on HBASE-15181:


FAILURE: Integrated in HBase-Trunk_matrix #749 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/749/])
HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 
bab8d1527b2d1a0d99095cee7e4191a9a7f57aea)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174956#comment-15174956
 ] 

Hudson commented on HBASE-15181:


SUCCESS: Integrated in HBase-1.3-IT #529 (See 
[https://builds.apache.org/job/HBase-1.3-IT/529/])
HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 
231a5807b4277020124ede1a9b44932e1048dfb6)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174911#comment-15174911
 ] 

Ted Yu commented on HBASE-15181:


Test failures were due to QA environment:

Caused by: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: Error while running command to get file permissions 
: ExitCodeException exitCode=127: /bin/ls: error while loading shared 
libraries: libacl.so.1: failed to map segment from shared object: Permission 
denied


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174858#comment-15174858
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
36s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 14s 
{color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
53s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 21s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 27s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 3s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 201m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hbase.replication.TestReplicationSyncUpToolWithBulkLoadedData |
| JDK v1.8.0_72 Timed out junit tests | 
org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries |
|   | org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook |
|   | org.apache.hadoop.hbase.wal.TestWALFiltering |
|   | org.apache.hadoop.hbase.wal.TestDefaultWALProviderWithHLogKey |
|   | org.apache.hadoop.hbase.coprocessor.TestRegionServerObserver |
|   |

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-03-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174478#comment-15174478
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} 
| {color:red} HBASE-15181 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/latest/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790785/HBASE-15181-0.98-ADD.patch
 |
| JIRA Issue | HBASE-15181 |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/781/console |


This message was automatically generated.



> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, 
> HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-branch-1.patch, 
> HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, 
> HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, 
> HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171112#comment-15171112
 ] 

Hudson commented on HBASE-15181:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1178 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1178/])
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
bc370c9a5d60045dd989955df55268c8773906cd)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
72169b4a8a88c2375f668cfd681aec905c063ba3)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98.patch, HBASE-15181-0.98.v4.patch, 
> HBASE-15181-98.patch, HBASE-15181-branch-1.patch, 
> HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, 
> HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, 
> HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171109#comment-15171109
 ] 

Hudson commented on HBASE-15181:


FAILURE: Integrated in HBase-0.98-matrix #304 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/304/])
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
bc370c9a5d60045dd989955df55268c8773906cd)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
72169b4a8a88c2375f668cfd681aec905c063ba3)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98.patch, HBASE-15181-0.98.v4.patch, 
> HBASE-15181-98.patch, HBASE-15181-branch-1.patch, 
> HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, 
> HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, 
> HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170826#comment-15170826
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} 0.98 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} 0.98 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s 
{color} | {color:red} hbase-server in 0.98 has 83 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 34s 
{color} | {color:red} hbase-server in 0.98 failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 3m 
51s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 9s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 57m 56s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 7s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
13s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[lines 224-227] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-28 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790333/HBASE-15181-0.98.patch
 |
| JIRA Issue | HBASE-15181 |
| Optional Tests |  asflicense  javac  javadoc  unit

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170770#comment-15170770
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
23s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} 0.98 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} 0.98 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} 0.98 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 38s 
{color} | {color:red} hbase-server in 0.98 has 83 extant Findbugs warnings. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s 
{color} | {color:red} hbase-server in 0.98 failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 6 new checkstyle issues in hbase-server 
(total was 29, now 34). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 11 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 3m 
27s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 45s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 52m 59s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 123m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[lines 222-225] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.9.1

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170753#comment-15170753
 ] 

Hudson commented on HBASE-15181:


SUCCESS: Integrated in HBase-1.3-IT #524 (See 
[https://builds.apache.org/job/HBase-1.3-IT/524/])
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
0cd299dfca68a317a6d16fbafadfd43cebd7b20d)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, 
> HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170727#comment-15170727
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HBASE-15181 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/latest/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790321/HBASE-15181-98.patch |
| JIRA Issue | HBASE-15181 |
| Powered by | Apache Yetus 0.1.0   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/751/console |


This message was automatically generated.



> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-98.patch, HBASE-15181-branch-1.patch, 
> HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, 
> HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, 
> HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170693#comment-15170693
 ] 

Hudson commented on HBASE-15181:


SUCCESS: Integrated in HBase-1.3 #577 (See 
[https://builds.apache.org/job/HBase-1.3/577/])
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
0cd299dfca68a317a6d16fbafadfd43cebd7b20d)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170459#comment-15170459
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
27s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s 
{color} | {color:green} branch-1 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} branch-1 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 4m 
48s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.0. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 5s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 125m 50s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 212m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[lines 236-239] |
| JDK v1.8.0_72 Failed junit tests | hadoop.hbase.ipc.TestSimpleRpcScheduler |
| JDK v1.7.0_95 Failed junit tests | hadoop.hbase.mapreduce.TestImportExport |
|   | hadoop.hbase.master.procedure.TestModifyNamespaceProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-27 |
|

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170430#comment-15170430
 ] 

Hudson commented on HBASE-15181:


SUCCESS: Integrated in HBase-Trunk_matrix #744 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/744/])
HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 
f7f96b9fb70f5b2243558cf531ab7fa51162e656)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
HBASE-15181 adds TestCompactionPolicy which was missing in first commit (tedyu: 
rev 03ffb30efe341c226a19b4e80ec0e3352e55806c)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Yong Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170322#comment-15170322
 ] 

Yong Zhang commented on HBASE-15181:


seems TestCompactionPolicy missed to push

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170252#comment-15170252
 ] 

Ted Yu commented on HBASE-15181:


Please fill in release notes.

Thanks

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170195#comment-15170195
 ] 

Enis Soztutar commented on HBASE-15181:
---

Ok, the hadoopqa run picked the previous result than. Anyway let's commit this. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170192#comment-15170192
 ] 

Ted Yu commented on HBASE-15181:


There was some rejected hunk for TestDefaultCompactSelection.java when I tried 
to apply patch v4 on branch-1.

Mind attaching patch for branch-1 ?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170125#comment-15170125
 ] 

Clara Xiong commented on HBASE-15181:
-

It seems neither @Nonnull or supressing warning works for the findbugs version 
for hadoopQA. The well-known solution is what I used: throw NPE for null input 
and it worked. The latest patch v4  has that, 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170089#comment-15170089
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. And still need to add some new configurations to fit our requirements. Can 
open new issues when this getting in.
Sounds good. We can do them in follow up issues. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170063#comment-15170063
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. java.lang.RuntimeException: Error while running command to get file 
permissions : ExitCodeException exitCode=127: /bin/ls: error while loading 
shared libraries: libpcre.so.3: failed to map segment from shared object: 
Permission denied
The node H0 maybe having problems for the unit tests. 

It seems that the findbugs warning did not take affect: 
https://builds.apache.org/job/PreCommit-HBASE-Build/729/artifact/patchprocess/new-findbugs-hbase-server.html



> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170040#comment-15170040
 ] 

Ted Yu commented on HBASE-15181:


Test failures don't seem to be related to the patch.

+1 from me.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, 
> HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170027#comment-15170027
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 59s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 9s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 129m 21s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 41s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
32s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 263m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[lines 236-239] |
| JDK v1.8.0_72 Timed out junit tests | 
org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient |
| JDK v1.7.0_95 Timed out junit tests | 
org.apache.hadoop.hbase.mapreduce.TestRowCounter |
|   | org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles |
|   |

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168754#comment-15168754
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 19s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 14s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 226m 48s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[line 238] |
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hbase.regionserver.TestRegionServerMetrics |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hbase.regionserver.TestRegionServerMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-26 |
| JIRA Patch URL |

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168473#comment-15168473
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
42s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
8s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 4m 21s 
{color} | {color:red} Patch generated 1 new checkstyle issues in hbase-server 
(total was 83, now 79). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 18s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 15s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 142m 41s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 110m 30s 
{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 309m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[line 239] |
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hbase.client.TestBlockEvictionFromClient |
|   | hadoop.hbase.regionserver.TestRegionServerMetrics |
|   | hadoop.hbase.quotas.TestQuotaThrottle |
| JDK v1.8.0_72 Timed out junit tests |

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168302#comment-15168302
 ] 

Duo Zhang commented on HBASE-15181:
---

+1. This feature is very important for our MiCloud service(https://i.mi.com/, a 
service like iCloud). And still need to add some new configurations to fit our 
requirements.  Can open new issues when this getting in.

Thanks.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, 
> HBASE-15181-master-v2.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully. Time range overlapping among 
> store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site.xml or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168099#comment-15168099
 ] 

Hadoop QA commented on HBASE-15181:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
1s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
14s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
6s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} master passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} master passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 39s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 18s 
{color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 3s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 101m 2s {color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 257m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hbase-server |
|  |  storeFile must be non-null but is marked as nullable  At 
DateTieredCompactionPolicy.java:is marked as nullable  At 
DateTieredCompactionPolicy.java:[line 240] |
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hbase.regionserver.TestRegionServerMetrics |
|   | hadoop.hbase.replication.TestReplicationKillSlaveRS |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hbase.regionserver.TestRegionServerMetrics |
|   |

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167663#comment-15167663
 ] 

Enis Soztutar commented on HBASE-15181:
---

I think this is ready to go in. Let's do a hadoopqa run. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, HBASE-15181-v1.patch, 
> HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167460#comment-15167460
 ] 

Clara Xiong commented on HBASE-15181:
-

No, the change I made in the algorithm will handle those cases gracefully and 
perform as exploring compaction, as enis and I had discussed.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-master-v1.patch, HBASE-15181-v1.patch, 
> HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167430#comment-15167430
 ] 

Ted Yu commented on HBASE-15181:


bq. seqId and timestamp are in completely opposite orders

Is there metric showing the above scenario so that user can switch back to 
exploring compaction ?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-25 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167422#comment-15167422
 ] 

Clara Xiong commented on HBASE-15181:
-

[~tedyu]
The change in the algorithm is very minor and doesn't have perf impact in our 
case. We have some out-of-order data due to replication lag mostly much smaller 
than flush interval, even smaller than the base window. We are pushing the new 
patch to production since we don't want to fork. We will continuously collect 
metrics but I don't expect any difference. I will share the results when they 
are ready.

The most impacted cases are: 
1. seqId and timestamp are in completely opposite orders, most likely resulted 
from business logic.
2. Bulk load files carry -1 as seqId when user explicitly turn off 
"hbase.mapreduce.bulkload.assign.sequenceNumbers". 

I don't recommend this compaction policy for these cases. As the worst case 
scenarios, they will fall back to exploring compaction.

[~vrodionov]
Time-series data that are loaded periodically with minimal time range overlap 
will perform perfectly in this case with base window set to cover the interval. 
Some users may have occasional bulkload data that could be out of proportion of 
the files on the same tiers and they will need to pay some scan performance 
penalty. As time passes, they move to higher tier, the penalty will diminish.
 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-24 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163166#comment-15163166
 ] 

Ted Yu commented on HBASE-15181:


Since latest patch has revised algorithm significantly, a new round of 
performance verification on cluster is desirable.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-23 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160244#comment-15160244
 ] 

Anoop Sam John commented on HBASE-15181:


Exclusion of large files wont exclude in btw files.  We see a large file we 
break there.
{code}
private ArrayList skipLargeFiles(ArrayList candidates, 
boolean mayUseOffpeak) {
int pos = 0;
while (pos < candidates.size() && !candidates.get(pos).isReference()
  && (candidates.get(pos).getReader().length() > 
comConf.getMaxCompactSize(mayUseOffpeak))) {
  ++pos;
}
if (pos > 0) {
  LOG.debug("Some files are too large. Excluding " + pos
  + " files from compaction candidates");
  candidates.subList(0, pos).clear();
}
return candidates;
  }
{code}

Bulk load exclude excludes in btw bulk loaded files.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-23 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160223#comment-15160223
 ] 

Clara Xiong commented on HBASE-15181:
-

[~enis] Thank you for clarification. But my question was: right now, the 
default compaction policy may exclude bulkload file and large files by 
configuration. If a user turns either one on, some files will be excluded from 
compaction selection. That may result in compaction of non-contiguous files 
sorted by seqId. Will that not cause any MVCC issue?


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-23 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160139#comment-15160139
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. But I want to add a tweak to this proposal on how to handle late-arriving 
data. I want to compact the out-of-order data with newer files other than older 
ones.
We are using the maxTs to select the tier, rather than minTs, so this is 
already true?  

bq. We have observed drastic IO reduction for the scans.
Great. This will be a nice addition to HBase.
Let me look at the patch. 

bq. I am wondering how we have maintained mvcc with Ratio-basedCompactionPolicy 
and its derived class ExploringCompactonPolicy when we allow filtering 
bulk-load and skip large files?
We are assigning (bulk load) a seqId to the bulk loaded files at the time of 
the bulk load. We execute a flush beforehand to make sure that the sequenceId 
that is assigned is not overlapping with the in-memory data's sequenceIds. 
We have a store-level read/write lock that coordinates bulk load files and file 
selection for compaction. Is this what you were asking for? 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-23 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160057#comment-15160057
 ] 

Ted Yu commented on HBASE-15181:


Clara:
Mind attaching your patch here for QA run ?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-23 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160026#comment-15160026
 ] 

Clara Xiong commented on HBASE-15181:
-

I updated the design spec  and uploaded the new patch at 
https://reviews.apache.org/r/43114/. Major changes:
1. sort by seq id. A new test was added.
2. handling future data. A new test was added.
3. other CR feedbacks

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-22 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158362#comment-15158362
 ] 

Clara Xiong commented on HBASE-15181:
-

[~enis]I am wondering how we have maintained mvcc with 
Ratio-basedCompactionPolicy and its derived class ExploringCompactonPolicy when 
we allow filtering bulk-load and skip large files? I followed them to make the 
behavior consistent. But I wonder whether that will allow non-contiguous 
compaction. By default, they are turned off.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-18 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153385#comment-15153385
 ] 

Clara Xiong commented on HBASE-15181:
-

[~enis][~vrodionov]  I appreciate all the callouts and suggestions greatly. I 
agree we have to guarantee correctness and I think the first proposal will 
carry the smaller trade-off.

We want to construct the tiers to maximize performance for  time-range scan on 
time-series data.The scan api are typically based on look back window based on 
data timestamps. So we want to use timestamp instead of creation time to align 
the tiers to the scans. And creation time may not be as reliable as sequence id 
as a monochronic indicator.  

[~vrodionov] I carefully thought about Enis' first proposal. It should work 
since we know which tier and compaction window a store file belongs to as long 
as we know the current time and the file's maxTimestamp. We don't need the 
sequenceId to build the tiers.

But I want to add a tweak to this proposal on how to handle late-arriving data. 
I want to compact the out-of-order data with newer files other than older ones. 
Since we don’t write future data, the worst scenario is that file on the lower 
tier  have long tails instead of the data goes to higher tier. The additional 
cost of long tail is the cost to scan newer and smaller files. Given the tiered 
design, we only need to scan additional data at most the tail size + current 
window size.This will also reduce the chance of recompacting small files to an 
out-of -portion file.  

For the bulk load scenarios, currently bulk-load file carries 0 as sequenceId 
which will land them at the highest tier. It is configurable to use the 
sequenceId at the time of creation which will land them at the lower tiers. We 
will need to call out that user wants to decide based on the data timestamps 
relatively to the tiers and the access pattern. 

Please let me know what you think.
 
[~enis]We do have performance results from production on a very large cluster 
replicated across multiple DC serving many concurrent time-range scans of 
different look-back windows. We are collecting more. I will share them 
externally once they are ready, most likely next week. We have observed drastic 
IO reduction for the scans.





> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151574#comment-15151574
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

One major use case still requires additional attention with OLDEST_CREATION_TS 
approach: 

Initial bulk load of a large data set?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151558#comment-15151558
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. Then we can use this OLDEST_CREATION_TIME instead of maxTimestamp. We will 
use date ranges for OLDEST_CREATION_TIME - not for date timestamps, 
You are right. The CREATE_TIME_TS gets updated when the files are compacted. If 
we keep OLDEST_CREATION_TS similar to how we keep seqIds for the newly 
compacted files, and use OLDEST_CREATION_TS to select the tier, it should work. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151553#comment-15151553
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

Sequence IDs can not be tiered. Yes? They are not time ranges

I think we need to enhance HFile/compactions to keep the oldest HFile creation 
time (which is equals to creation time of a newly flashed store file or the 
oldest creation time of files being compacted).

Then we can use this OLDEST_CREATION_TIME instead of maxTimestamp. We will use 
date ranges for  OLDEST_CREATION_TIME - not for date timestamps

There is one to one correlation between Seq  Id and OLDEST_CREATION_TIME: the 
ordering by OLDEST_CREATION_TIME and by sequence ID is the same.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151531#comment-15151531
 ] 

Enis Soztutar commented on HBASE-15181:
---

[~claraxiong] this is great work BTW. Thanks for pushing for this. 

I just wanted to bring one open item back to jira to see whether ordering files 
with timestamps, rather than seqid, and doing non-contiguous is acceptable: 
{quote}
The tiered structure is built completely and solely on the data timestamp of 
the store files. We cannot sort by segId at all. Any logic for updates/deletes 
depending on seqId would break. The user needs to guarantee updates or deletes 
are in order aligned with time stamp order. This compaction policy is pluggable 
and this limitation will be lifted if the work to allow compaction out of order 
of seqId is done. As you pointed out in the ticket: "What I was saying offline 
is that we can actually do something like HBASE-9905 and disallow 
client-settable timestamps, or do something like HBASE-10247 where the table 
pre-declares that we won't have same-ts edits, it should be possible to do 
non-contigous compactions."
{quote}

Given that there is no hard-guarantees as of now about whether the client can 
do out of order timestamp writes, can we still always be correct, but if the 
client does an excessive amount of these writes, the compaction will not 
perform as efficiently. Basically, if we can, I would like a system where the 
client will get the full benefit automatically if the timestamps follow seqId 
order, but if not, the results are still correct. If there are occasional 
out-of-order writes, the performance is not that badly affected, if not, the 
compaction algorithm can behave badly. 

I think we can achieve this with something like this: 
 - Use max ts as in the design for store files. 
 - Instead of ordering files by decreasing ts, order files by decreasing seqId. 
 - Iterating from highest seqId to lowest, find the tier that the file belongs 
to using maxTs. The only difference from the current algorithm is that in the 
iteration, we should always assign tiers in increasing order t0, t1, t2. This 
means that if out of order data is present, and we end up with flushes where 
maxTs is very old, lets say it falls into t2, then t1 and t0 would be empty and 
all files will be t2+. Otherwise (if you do not have out of order writes, or 
have them occasionally) the behavior will be the same as in the design. 

Alternatively HFiles also have CREATE_TIME_TS, which is different than 
maxTimestamp. maxTS comes from the user data, while hfile create time is the 
system time at the time of hfile writing. If we do the tier selection based on 
hfile time instead of users maxTs, then we might not even have that problem. 
Again, if there is actual correlation of user's timestamps with the seqIds (or 
hfile create times), you would get all the benefits, otherwise, we would still 
return the correct results, but compaction may not be optimal (I think it will 
be like falling back to exploring one). Anyway, just a suggestion to consider. 
I might not have thought of all corner cases. 

You are saying that this patch is also in production. Are there any numbers 
you've collected? 


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0, 1.3.0, 0.98.19
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150900#comment-15150900
 ] 

Clara Xiong commented on HBASE-15181:
-

Thank you. I have incorporated and will upload a new patch shortly once some of 
the open question on RB are resolved.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-17 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150705#comment-15150705
 ] 

Clara Xiong commented on HBASE-15181:
-

Sorry I meant I read your comments and the HStore codes. You are right. [~enis]

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-16 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150019#comment-15150019
 ] 

Clara Xiong commented on HBASE-15181:
-

Read the comments and code. It is not needed. Will remove that change.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-16 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149793#comment-15149793
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq.  My estimate is it will take significant expansion to make it work with 
time boundaries and it would be less effort and cleaner if I create a new 
compactor
Alright, we do not want to duplicate code and work that's why we were 
suggesting to consider that. I did not take a look at it closely to say what 
would be involved, so we'll defer to your judgement. 

bq.  I can break this patch up to two patches: one for dynamic configuration 
per column family and the other for the pluggable DateTieredCompactionPolicy.
Are you talking about CompoundConfiguration? Did you see my comments above 
regarding that? If the CompoundConfiguration changes are needed, agreed that, 
it can go in separately. 


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-09 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139645#comment-15139645
 ] 

Clara Xiong commented on HBASE-15181:
-

Two updates:
1. Thanks to [~stack] and [~enis] I went through the StripeCompactor code to 
see whether I can leverage the code for multiple output to chop the files by 
tier boundaries. My estimate is it will take significant expansion to make it 
work with time boundaries and it would be less effort and cleaner if I create a 
new compactor.
2. Thanks to [~vrodionov] [~stack] and [~enis] After the discussion on the 
reason we need to compact contiguously, I realized we have a hole in the 
algorithm. I sort the files by client defined max time stamp, not sequence id. 
Although the algorithm still only select contiguous store files  , they are not 
contiguous on seq id. The new changes for seq id  will make it work. I can 
break this patch up to two patches: one for dynamic configuration per column 
family and the other for the pluggable DateTieredCompactionPolicy. 

Please let me know what you think.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-04 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133258#comment-15133258
 ] 

Ted Yu commented on HBASE-15181:


Please incorporate the above and submit new patch for QA run.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-04 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133182#comment-15133182
 ] 

Clara Xiong commented on HBASE-15181:
-

Yes， that serves the same purpose with the properly added check.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-03 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131670#comment-15131670
 ] 

Ted Yu commented on HBASE-15181:


w.r.t. the nested loop in getBuckets(), do you think the following is more 
readable ?
{code}
while (it.hasNext()) {
  if (!target.onTarget(it.peek().getSecond())) {
// If the file is too new for the target, skip it.
if (target.compareToTimestamp(it.peek().getSecond()) < 0) {
  it.next();
} else {
  // If the file is too old for the target, switch to higher
  // tier.
  target = target.nextTarget(tierBase);
}
  }

  ArrayList bucket = Lists.newArrayList();
  // Add all files of the same tier to current bucket
  while (it.hasNext() && target.onTarget(it.peek().getSecond())) {
bucket.add(it.next().getFirst());
  }
  if (!bucket.isEmpty()) {
buckets.add(bucket);
  }
}
{code}
Basically there is no need for the label.
I ran TestTieredCompaction and it passed.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-02 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129144#comment-15129144
 ] 

Clara Xiong commented on HBASE-15181:
-

Uploaded to RB at https://reviews.apache.org/r/43114/.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-02 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129366#comment-15129366
 ] 

Clara Xiong commented on HBASE-15181:
-

Thank you for your input. I will update the spec. 

We have date tiering like the spotify blog but with  tweaks. The overall theme 
is tiered store files based on the max time stamp for their data. If new data 
comes in perfect order, there is only one file per window except the incoming 
window or when they move across tiers. If there is out-of-order  data/late 
arrival/bulk loaded data, they will fall into older windows for compaction 
selection and may trigger compactions. We allow user to plug in a compaction 
policy for this case, default at exploring compaction, to reduce compaction 
storms. Other policy can be plugged in if user want to keep file count small by 
paying the price of re-compaction.

We use max time stamp instead of min as in the spotify blog to reduce 
performance penalty for out-of-order data, assuming no timestamp will be set to 
future time.

TTL works out of box by skipping the whole files.

Major compaction is disabled except pushed manually.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-02 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129372#comment-15129372
 ] 

Clara Xiong commented on HBASE-15181:
-

With this implementation, we have an overall compaction policy and a per-window 
compaction policy which is default to exploring policy. So I have to use two 
separate configuration.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127037#comment-15127037
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

Looking into the code:

Why is this method generic and static?
{code}
 static  List getBuckets(Collection> files, long 
timeUnit,
  int tierBase, long now) {
}
{code}

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127052#comment-15127052
 ] 

Clara Xiong commented on HBASE-15181:
-

Not really. The algorithm depends on the max time stamp.  If the bulk load file 
has a time span significantly bigger than the window, the scan performance will 
suffer by the extra data being scanned. But once the files are moved to larger 
windows on higher tiers, the penalty decreases and finally disappear. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126893#comment-15126893
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

Thanks for the patch, [~claraxiong]

You still rely on default major compaction for bulk loaded files, periodic 
major compactions are disabled, therefore the only way to compact bulk loaded 
files is to force major compaction manually. For many applications the new 
compaction policy won't give much benefit - they periodically do batch load and 
they will have to run major compaction after on a daily basis. Have you thought 
about that?


> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126906#comment-15126906
 ] 

Clara Xiong commented on HBASE-15181:
-

[~vrodionov]I am thinking about a multiple-output solution for major compaction 
so the output will be laid out perfectly. It can be easily plugged into this 
solution. If you have anything that works, could you share?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126952#comment-15126952
 ] 

Dave Latham commented on HBASE-15181:
-

Definitely seems to depend on the pattern of loads.  If most bulk loads are for 
data for a limited time range then the files will naturally move into 
appropriate windows, and scans for recent data should include or skip them as 
appropriate.  If the bulk loads cover data over the full history then scans for 
recent data will end up including them and touching the old data they have as 
well until the file ages, so some sort of re-compacting (such as doing a major 
compaction splitting the data into windowed output files) would seem to help.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127096#comment-15127096
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

{quote}
Can you elaborate on why bulk loaded files on a limited time range require 
major compaction
{quote}

According to RatioBasedCompactionPolicy selectCompaction:
{code}
if (!isTryingMajor && !isAfterSplit) {
  // We're are not compacting all files, let's see what files are applicable
  candidateSelection = filterBulk(candidateSelection);
  candidateSelection = applyCompactionPolicy(candidateSelection, 
mayUseOffPeak, mayBeStuck);
  candidateSelection = checkMinFilesCriteria(candidateSelection);
}
{code}

we always filter bulk loaded files out in minor compactions. The reason why is 
not clear to me. May be some our gurus can answer this 
question? [~enis], [~saint@gmail.com], [~apurtell]? 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127112#comment-15127112
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

[~claraxiong], why is it TieredCompactionPolicy and not 
DateTieredCompactionPolicy?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127166#comment-15127166
 ] 

Clara Xiong commented on HBASE-15181:
-

Whether to skip bulk load file for minor compaction is configurable by 
"hbase.mapreduce.hfileoutputformat.compaction.exclude". This was put in for the 
following issues.
https://issues.apache.org/jira/browse/HBASE-3690
https://issues.apache.org/jira/browse/HBASE-3404

By default it is off for the existing compaction policies to avoid compaction 
storms.  We may want to recommend people to turn it on for date-based tiered 
compaction. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127134#comment-15127134
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

The original idea of DateTieredCompactionPolicy (as in HBASE-14477) was to 
improve read of most recent data and to reduce overall compaction-related IO. 
The proposed simple implementation will meet these requirements  only for 
applications w/o periodic data bulk loading and for mostly in-order data 
streams (that is probably use case at Yahoo?)

Periodic data bulk loading and significant out -of -order data streams reduces 
the value of this implementation significantly. Before we can move on with 
TCP/DTCP we should figure out how to solve these problems, may be in a separate 
JIRA, as since handling bulk loaded data is not TCP - specific but generic 
approach.

Two questions:

# Why is bulk loaded data excluded from minor compaction
# Why we can not select non-contiguous range of store files for compaction? 

 

 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127195#comment-15127195
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

Yes, you are right. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127284#comment-15127284
 ] 

Dave Latham commented on HBASE-15181:
-

Those are great questions, Vladimir - I hope people who know better chime in 
and confirm / deny.  Poking at the issues that Clara linked, I can speculate:

The default compaction algorithm at the time (Ratio based?) was intended to 
handle stores with a shape where the older files were larger than recent files, 
so it sounds the policy would not intelligently handle a smaller bulk load file 
that is sorted to be oldest and end up doing a large wasteful compaction.

For contiguous-only compactions, I think that's because the sequence IDs are 
only stored per-file, not per-cell.  So if you want to compare two Cell 
sequence IDs for identical keys, then you need to have a strict ordering on 
HFile sequence IDs.  If you compact out-of-order HFiles, then you don't have 
strictly ordered sequence IDs any more.

Both speculation.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Dave Latham (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127087#comment-15127087
 ] 

Dave Latham commented on HBASE-15181:
-

Thanks for the feedback and review, Vladimir.  Can you elaborate on why bulk 
loaded files on a limited time range require major compaction?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127088#comment-15127088
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

{quote}
Not really.
{quote}

This is not TieredCompactionPolicy - specific issue per se, but ...

let us take a look at the following scenario (one of our customers, by the way):

We bulk load data periodically once a day and that is it. Every time new store 
file is created (per region/cf) which is not eligible for minor compaction. The 
only way to compact them is to force major compaction periodically. To compact 
new 10 files (for the last 10 days) you will need to compact all region (may be 
months/years of data). 

As I pointed out this is not TCP - specific issue - it exists for all other 
SizeTiered -based compaction policies.  
 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127398#comment-15127398
 ] 

Enis Soztutar commented on HBASE-15181:
---

In HFileOutputFormat2 we are doing this: 
{code}
final boolean compactionExclude = conf.getBoolean(
"hbase.mapreduce.hfileoutputformat.compaction.exclude", false);
...
  w.appendFileInfo(StoreFile.EXCLUDE_FROM_MINOR_COMPACTION_KEY,
  Bytes.toBytes(compactionExclude));
{code}
So, it seems that we do allow bulk load files to be minor-compacted by default. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127426#comment-15127426
 ] 

stack commented on HBASE-15181:
---

Good stuff [~vrodionov] You too [~claraxiong]. Suggest that a sentence on bulk 
load concern make it into release notes.

Nice writeup... 

What is the 'Sliding Window Tiered Compaction'? I don't know what it is so I 
don't know why "...sliding windows may trigger compaction too frequently and 
cause some files to be re-compacted." Is it the "...fully mimic the behavior of 
STCS and compact SSTables with a relative age difference less than a constant 
factor. " from the cited spotiify document?

I see no engagement with stripe compactions in the write up. Were they 
considered at all (stripe purportedly does best when the data is timeseries 
shaped). Would be good to at least call out how this differs.

Suggest you give more direct credit to 
https://labs.spotify.com/2014/12/18/date-tiered-compaction/ . You do so for the 
image copied but I find I have to read the original to understand what is being 
proposed and it seems like a bunch of the notions and text comes from it.

bq. And at the time of recovery, we may need to bulk load data. 

Which recovery is this? And who is doing the bulk load?

Do major compactions run in the date tiered scheme?

As per [~vrodionov], need the 'date' qualifier on configuration names... 

So, we have date tiering like the spotify blog compacting in first tier if 
above configured threshold. For other tiers, we do default exploring 
compactions. If bulk load, it can ruin our tiering but we'll just drop it in 
the tier that has its oldest timestamp? Major compactions does all tiers but 
the newest? (And when dated tiered, should be easier to drop whole files if 
TTL?)

Let me look at the patch.










> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127380#comment-15127380
 ] 

Enis Soztutar commented on HBASE-15181:
---

HBASE-7763 is the jira that talks about why we need to select contiguous set of 
files for compaction. The main idea is that if two puts happen with the same 
timestamp, we are ordering them using the sequenceId so that the "latest" one 
is returned always. This allows the user to override a previously set value for 
example in some cases. 

The problem with non-contiguous compactions is that, we do not keep the seqids 
of cells forever. After some time, we remove per-cell seqIds and only keep 1 
sequenceId per hfile. Thus if we end up with two different puts having 
different seqIds in files, but with same timestamp, then allowing 
non-contiguous compactions may break the ordering. 

For example: 
{code}
file1: seqId=10, row=foo, val=v1 ts = 100 
file2: seqId=20, row=bar, val=v2, ts=200
file3: seqId=30, row=foo, val=v3, ts = 100 
file4: seqId=40, row=bar, val=v4, ts=300
{code}
If I compact file1 and file4 together, then the new file will have

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Clara Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127314#comment-15127314
 ] 

Clara Xiong commented on HBASE-15181:
-

As to the second question: my implementation sorts store files by max 
timestamp. It is desirable not to compact non-contiguous range of store files 
to maximize the scan performance by reducing time range overlap. 

As to the more generic question about out-of-order data streams, they are 
assigned to the compaction windows based on their max time stamps, not sequence 
id. We have to pay some scan performance penalty only if the flush file 
contains much wider time range compared to the compaction windows. A generic 
solution for this would be splitting off data out of compaction window using 
multiple output for compactor. As stated in the design spec, we don't see 
enough benefit for this solution, yet. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127491#comment-15127491
 ] 

stack commented on HBASE-15181:
---

Sounds right to me [~davelatham]

Ok keep the sequenceid around always [~enis].. .then we could do non-contiguous 
compactions and respect order in which edits were added.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Vladimir Rodionov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126993#comment-15126993
 ] 

Vladimir Rodionov commented on HBASE-15181:
---

{quote}
 If most bulk loads are for data for a limited time range then the files will 
naturally move into appropriate windows, and scans for recent data should 
include or skip them as appropriate.
{quote}

Still requires major compaction (of all files in a region). 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127610#comment-15127610
 ] 

Enis Soztutar commented on HBASE-15181:
---

bq. Ok keep the sequenceid around always Enis Soztutar.. .then we could do 
non-contiguous compactions and respect order in which edits were added.
Yes. I think the mvcc unification made this 2 weeks, then Lars made it to be 1 
day or so because of extra overhead. Bad thing is that we do not want to keep 2 
8-byte objects (ts and seqId) per cell. Time to unify seqId with ts. 

Is this needed? We already do the CompoundConfig in HStore from 
{{family.getConfiguration()}}. 
{code}
+this.conf = new CompoundConfiguration().add(conf)
+.addBytesMap(storeConfigInfo.getHColumnDescriptor().getValues());
{code}

Can you also put up the patch at RB. it would be easier that way. 

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-02-01 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127619#comment-15127619
 ] 

Enis Soztutar commented on HBASE-15181:
---

Can we re-use the already existing min threshold rather than introduce 
{{hbase.hstore.compaction.tiered.min.threshold}}? The semantics would be 
applied the same. 

bq. I see no engagement with stripe compactions in the write up. Were they 
considered at all (stripe purportedly does best when the data is timeseries 
shaped). Would be good to at least call out how this differs.
This is a very good point. Is there a way we can override how stripes are done 
(instead of row-range based stripes, we have tiered ranges) and have it share 
the same code? Maybe a pipe dream. cc [~sershe].  

We are actually doing multiple-output-files in stripe compaction policy for 
compactions and for L0 flushes.  

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-01-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122082#comment-15122082
 ] 

Ted Yu commented on HBASE-15181:


Is this in production ?
If so, can you share performance numbers ?

75public static final String MAX_AGE = CONFIG_PREFIX + 
"tiered.max.storefile.age";
76public static final String TIME_UNIT = CONFIG_PREFIX + 
"tiered.time.unit";
77public static final String TIER_BASE = CONFIG_PREFIX + 
"tiered.tier.base";
78public static final String MIN_THRESHOLD = CONFIG_PREFIX + 
"tiered.min.threshold";

Please add javadoc for the parameters above.
Normally such constants end with '_KEY'

TieredCompactionPolicy.java needs Apache license. Please add annotation for 
audience and class javadoc.

Putting the next patch on review board would facilitate reviewing.

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction

2016-01-28 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122174#comment-15122174
 ] 

Ted Yu commented on HBASE-15181:


I went over the linked doc.
bq. For other tiers, we apply the exploring compaction using a small file count.
Can you give a bit more detail on this small file count ? Does it appear in the 
tables at the end of the doc ?

How do you handle bulk loaded hfiles (in terms of maintaining window 
boundaries) ?

> A simple implementation of date based tiered compaction
> ---
>
> Key: HBASE-15181
> URL: https://issues.apache.org/jira/browse/HBASE-15181
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Clara Xiong
>Assignee: Clara Xiong
> Fix For: 2.0.0
>
> Attachments: HBASE-15181-v1.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to 
> Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based 
> tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent 
> data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the 
> right store file for time-range-scan and re-compacton with existing store 
> file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance 
> impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or 
> per-column-famly level by hbase shell.
> Design spec is at 
> https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 106 matches

Mail list logo