[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989087#comment-15989087 ] Dave Latham commented on HBASE-15181: - [~tychang] it looks like there's now work going on to help opentsdb take advantage of this https://github.com/OpenTSDB/opentsdb/pull/971 > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932909#comment-15932909 ] Dave Latham commented on HBASE-15181: - I don't know enough about the details of the data schema that opentsdb to say if it will benefit. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932869#comment-15932869 ] Tianying Chang commented on HBASE-15181: [~enis] that is a great point. [~davelatham] I am wondering will opentsdb benefit from it thought since I assume it set startRow/stoprow with start/stop time encoded in it. That should have already excluded all unnecessary data? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928898#comment-15928898 ] Dave Latham commented on HBASE-15181: - Indeed that's true (and we do the same), but the application needs to be intelligent enough to use it. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928887#comment-15928887 ] Enis Soztutar commented on HBASE-15181: --- bq. To take good advantage of it, the queries would need to set the time range on the scan itself (as opposed to purely encoding time information directly in the row/column identifiers and relying on them for time bounding). I'm not sure if opentsdb does that. Excellent point. However, even if the actual time is embedded in the row/column model, an engine like opentsdb might be able to take advantage of this compaction strategy. The idea is that there should be a time bound error (lets call if {{E}}, where it is assumed that all data belonging to time {{T1}} has to be persisted. Then for queries for time ranges {{T1}} to {{T2}}, the engine can also set the time range on the scan object using {{T1-E}} to {{T2+E}}. This will provide both correctness (since the engine will still do filtering on the incoming data using T1 and T2, but the hbase scan will ignore data not in the range. Ambari metrics server does something like this. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928560#comment-15928560 ] Tianying Chang commented on HBASE-15181: [~davelatham] Good point about the set time range vs encoding the time information into the row. By opentsdb schema design, my gut feeling is it is not using set time range, unfortunately. I will verify it. If it is not using set time range, I guess since our opentsdb usage configured to only keep 28 days with TTL set to 28, one benefit is we can utilize is it can drop the whole store files with the expired TTL. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927057#comment-15927057 ] Dave Latham commented on HBASE-15181: - It's been awhile since I looked at opentsdb's write/read patterns, but I do think date tiered compaction would be a great fit for time series data, especially if the engine is aware. To take good advantage of it, the queries would need to set the time range on the scan itself (as opposed to purely encoding time information directly in the row/column identifiers and relying on them for time bounding). I'm not sure if opentsdb does that. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927035#comment-15927035 ] Tianying Chang commented on HBASE-15181: [~davelatham] Thanks a lot for the information. Will look through the subtasks on HBASE-15339 also. Will give it a try to apply those patches to 1.2. My feeling is this kind of compaction algorithm should be very suitable for opentsdb use case, do you know if anyone had concrete experience on the improvement from this compaction algorithm on opentsdb? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924910#comment-15924910 ] Dave Latham commented on HBASE-15181: - The branch-1 patch likely would have applied to 1.2 as well when it was developed, but since 1.2.x patch releases should only have bug fixes, not new features like this it wasn't applied there. I don't know if branch-1.2 has changed so that it would not apply. If you do try, I would recommend also picking up the follow on work in subtasks of HBASE-15339 > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15924776#comment-15924776 ] Tianying Chang commented on HBASE-15181: [~clarax98007] Do you have patch for 1.2 also? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355819#comment-15355819 ] Mikhail Antonov commented on HBASE-15181: - Sorry, wrong jira. was meant for HBASE-15454 > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355816#comment-15355816 ] Mikhail Antonov commented on HBASE-15181: - Ping. Since 1.3 was postponed several times due to various issue, curious if there's any update on that? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243150#comment-15243150 ] Nick Dimiduk commented on HBASE-15181: -- You folks see HBASE-15659 ? Maybe there's some metrics you'd want to expose for this one as well? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193566#comment-15193566 ] ramkrishna.s.vasudevan commented on HBASE-15181: Thank you for the info. That was useful. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189601#comment-15189601 ] Clara Xiong commented on HBASE-15181: - There are a few tweaks made in this patch from Cassandra's design: 1. Use maxTimestamp instead of minTimestamp to favor late arriving data. 2. Plug in a compaction policy (default to exploring compaction)per window to reduce wasteful compaction. 3. Normalize timestamp to sequence id to guarantee contiguous compaction by seq id for correctness while paying some performance penalty for out-of-order data. The subsequent work HBASE-15400 so far is to deal with major compaction and very large bulk load files so we can still maintain a date tiered layer. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189575#comment-15189575 ] ramkrishna.s.vasudevan commented on HBASE-15181: Just interested in knowing this feature. I had a quick look at the code. If you don't mind - a very naive question - what is the difference between the Date Tiered compaction in CAssandra and this one in HBase? I can see subsequent JIRAs being actively pursued here but just wanted to get a note on that. I am trying to understand things from the design doc attached here but still wanted to get an idea. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176373#comment-15176373 ] Clara Xiong commented on HBASE-15181: - Results from our production is added at https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing > Results in our production is at > https://docs.google.com/document/d/1GqRtQZMMkTEWOijZc8UCTqhACNmdxBSjtAQSYIWsmGU/edit# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176320#comment-15176320 ] Vladimir Rodionov commented on HBASE-15181: --- [~claraxiong] {quote} When it doesn't perform as well: random gets without a time range {quote} I think we should educate users how to read data efficiently in case of multiple store files in a region: *io.storefile.bloom.error.rate* must be decreased from default 0.01 to 0.001 or less (I would recommend 0.0001). This will reduce false positives in rowkey lookups accordingly but will increase space occupied by bloom filter in memory (of course). > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176080#comment-15176080 ] Hudson commented on HBASE-15181: ABORTED: Integrated in HBase-0.98-matrix #305 (See [https://builds.apache.org/job/HBase-0.98-matrix/305/]) HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 41c04ee685b07321efe570fa91416ba90f8eeaa9) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176024#comment-15176024 ] Hudson commented on HBASE-15181: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1179 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1179/]) HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 41c04ee685b07321efe570fa91416ba90f8eeaa9) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175677#comment-15175677 ] Clara Xiong commented on HBASE-15181: - Can this be added as release note? Date tiered compaction policy is a date-aware store file layout that is beneficial for time-range scans for time-series data. When it performs well: - reads for limited time ranges, especially scans of recent data When it doesn't perform as well: - random gets without a time range - frequent deletes and updates - out of order data writes, especially writes with timestamps in the future - bulk loads of historical data Recommended configuration: To turn on Date Tiered Compaction: hbase.hstore.compaction.compaction.policy: org.apache.hadoop.hbase.regionserver.compactions.DateTieredCompactionPolicy Parameters for Date Tiered Compaction: hbase.hstore.compaction.date.tiered.max.storefile.age.millis: Files with max-timestamp smaller than this will no longer be compacted.Default at Long.MAX_VALUE. hbase.hstore.compaction.date.tiered.base.window.millis: base window size in milliseconds. Default at 6 hours. hbase.hstore.compaction.date.tiered.windows.per.tier: number of windows per tier. Default at 4. hbase.hstore.compaction.date.tiered.incoming.window.min: minimal number of files to compact in the incoming window. Set it to expected number of files in the window to avoid wasteful compaction. Default at 6. hbase.hstore.compaction.date.tiered.window.policy.class: the policy to select store files within the same time window. It doesn’t apply to the incoming window. Default at exploring compaction. This is to avoid wasteful compaction. With tiered compaction all servers in the cluster will promote windows to higher tier at the same time, so using a compaction throttle is recommended: hbase.regionserver.throughput.controller:org.apache.hadoop.hbase.regionserver.compactions.PressureAwareCompactionThroughputController Because there will most likely be more store files around, we need to adjust the configuration so that flush won't be blocked and compaction will be properly throttled: hbase.hstore.blockingStoreFiles: change to 50 if using all default parameters when turning on date tiered compaction. Use 1.5~2 x projected file count if changing the parameters, Projected file count = windows per tier x tier count + incoming window min + files older than max age For more details, please refer to the design spec at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit# > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175628#comment-15175628 ] Jean-Marc Spaggiari commented on HBASE-15181: - Will it be possible to add a release not with a description on how to enable that, etc.? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175207#comment-15175207 ] Hudson commented on HBASE-15181: FAILURE: Integrated in HBase-1.3 #583 (See [https://builds.apache.org/job/HBase-1.3/583/]) HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 231a5807b4277020124ede1a9b44932e1048dfb6) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175114#comment-15175114 ] Hudson commented on HBASE-15181: FAILURE: Integrated in HBase-Trunk_matrix #749 (See [https://builds.apache.org/job/HBase-Trunk_matrix/749/]) HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev bab8d1527b2d1a0d99095cee7e4191a9a7f57aea) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174956#comment-15174956 ] Hudson commented on HBASE-15181: SUCCESS: Integrated in HBase-1.3-IT #529 (See [https://builds.apache.org/job/HBase-1.3-IT/529/]) HBASE-15181 Addendum fixes findbugs warning (Clara Xiong) (tedyu: rev 231a5807b4277020124ede1a9b44932e1048dfb6) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174911#comment-15174911 ] Ted Yu commented on HBASE-15181: Test failures were due to QA environment: Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=127: /bin/ls: error while loading shared libraries: libacl.so.1: failed to map segment from shared object: Permission denied > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-ADD.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174858#comment-15174858 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 43s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 14s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 30m 21s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 27s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 3s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 201m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hbase.replication.TestReplicationSyncUpToolWithBulkLoadedData | | JDK v1.8.0_72 Timed out junit tests | org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries | | | org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook | | | org.apache.hadoop.hbase.wal.TestWALFiltering | | | org.apache.hadoop.hbase.wal.TestDefaultWALProviderWithHLogKey | | | org.apache.hadoop.hbase.coprocessor.TestRegionServerObserver | | |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174478#comment-15174478 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red} HBASE-15181 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/latest/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790785/HBASE-15181-0.98-ADD.patch | | JIRA Issue | HBASE-15181 | | Powered by | Apache Yetus 0.1.0 http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/781/console | This message was automatically generated. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98-ADD.patch, HBASE-15181-0.98.patch, > HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, HBASE-15181-branch-1.patch, > HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, > HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, > HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171112#comment-15171112 ] Hudson commented on HBASE-15181: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1178 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1178/]) HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev bc370c9a5d60045dd989955df55268c8773906cd) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 72169b4a8a88c2375f668cfd681aec905c063ba3) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98.patch, HBASE-15181-0.98.v4.patch, > HBASE-15181-98.patch, HBASE-15181-branch-1.patch, > HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, > HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, > HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171109#comment-15171109 ] Hudson commented on HBASE-15181: FAILURE: Integrated in HBase-0.98-matrix #304 (See [https://builds.apache.org/job/HBase-0.98-matrix/304/]) HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev bc370c9a5d60045dd989955df55268c8773906cd) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 72169b4a8a88c2375f668cfd681aec905c063ba3) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultStoreEngine.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98.patch, HBASE-15181-0.98.v4.patch, > HBASE-15181-98.patch, HBASE-15181-branch-1.patch, > HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, > HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, > HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170826#comment-15170826 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} 0.98 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} 0.98 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 58s {color} | {color:red} hbase-server in 0.98 has 83 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 34s {color} | {color:red} hbase-server in 0.98 failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 3m 51s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 9s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 57m 56s {color} | {color:green} hbase-server in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 59m 7s {color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 133m 46s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[lines 224-227] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-28 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790333/HBASE-15181-0.98.patch | | JIRA Issue | HBASE-15181 | | Optional Tests | asflicense javac javadoc unit
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170770#comment-15170770 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 23s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} 0.98 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} 0.98 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} 0.98 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 38s {color} | {color:red} hbase-server in 0.98 has 83 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in 0.98 failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} 0.98 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 6 new checkstyle issues in hbase-server (total was 29, now 34). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 11 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 3m 27s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 45s {color} | {color:green} hbase-server in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 52m 59s {color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 123m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[lines 222-225] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.9.1
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170753#comment-15170753 ] Hudson commented on HBASE-15181: SUCCESS: Integrated in HBase-1.3-IT #524 (See [https://builds.apache.org/job/HBase-1.3-IT/524/]) HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 0cd299dfca68a317a6d16fbafadfd43cebd7b20d) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-0.98.v4.patch, HBASE-15181-98.patch, > HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170727#comment-15170727 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HBASE-15181 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/latest/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790321/HBASE-15181-98.patch | | JIRA Issue | HBASE-15181 | | Powered by | Apache Yetus 0.1.0 http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/751/console | This message was automatically generated. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-98.patch, HBASE-15181-branch-1.patch, > HBASE-15181-master-v1.patch, HBASE-15181-master-v2.patch, > HBASE-15181-master-v3.patch, HBASE-15181-master-v4.patch, > HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170693#comment-15170693 ] Hudson commented on HBASE-15181: SUCCESS: Integrated in HBase-1.3 #577 (See [https://builds.apache.org/job/HBase-1.3/577/]) HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev 0cd299dfca68a317a6d16fbafadfd43cebd7b20d) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170459#comment-15170459 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} branch-1 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} branch-1 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s {color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} branch-1 passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} branch-1 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 4m 48s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 5s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 125m 50s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 212m 10s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[lines 236-239] | | JDK v1.8.0_72 Failed junit tests | hadoop.hbase.ipc.TestSimpleRpcScheduler | | JDK v1.7.0_95 Failed junit tests | hadoop.hbase.mapreduce.TestImportExport | | | hadoop.hbase.master.procedure.TestModifyNamespaceProcedure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-27 | |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170430#comment-15170430 ] Hudson commented on HBASE-15181: SUCCESS: Integrated in HBase-Trunk_matrix #744 (See [https://builds.apache.org/job/HBase-Trunk_matrix/744/]) HBASE-15181 A simple implementation of date based tiered compaction (tedyu: rev f7f96b9fb70f5b2243558cf531ab7fa51162e656) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDateTieredCompaction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/RatioBasedCompactionPolicy.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestDefaultCompactSelection.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/MockStoreFile.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/DateTieredCompactionPolicy.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java HBASE-15181 adds TestCompactionPolicy which was missing in first commit (tedyu: rev 03ffb30efe341c226a19b4e80ec0e3352e55806c) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionPolicy.java > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-branch-1.patch, HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170322#comment-15170322 ] Yong Zhang commented on HBASE-15181: seems TestCompactionPolicy missed to push > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170252#comment-15170252 ] Ted Yu commented on HBASE-15181: Please fill in release notes. Thanks > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170195#comment-15170195 ] Enis Soztutar commented on HBASE-15181: --- Ok, the hadoopqa run picked the previous result than. Anyway let's commit this. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170192#comment-15170192 ] Ted Yu commented on HBASE-15181: There was some rejected hunk for TestDefaultCompactSelection.java when I tried to apply patch v4 on branch-1. Mind attaching patch for branch-1 ? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170125#comment-15170125 ] Clara Xiong commented on HBASE-15181: - It seems neither @Nonnull or supressing warning works for the findbugs version for hadoopQA. The well-known solution is what I used: throw NPE for null input and it worked. The latest patch v4 has that, > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170089#comment-15170089 ] Enis Soztutar commented on HBASE-15181: --- bq. And still need to add some new configurations to fit our requirements. Can open new issues when this getting in. Sounds good. We can do them in follow up issues. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170063#comment-15170063 ] Enis Soztutar commented on HBASE-15181: --- bq. java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=127: /bin/ls: error while loading shared libraries: libpcre.so.3: failed to map segment from shared object: Permission denied The node H0 maybe having problems for the unit tests. It seems that the findbugs warning did not take affect: https://builds.apache.org/job/PreCommit-HBASE-Build/729/artifact/patchprocess/new-findbugs-hbase-server.html > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170040#comment-15170040 ] Ted Yu commented on HBASE-15181: Test failures don't seem to be related to the patch. +1 from me. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-master-v3.patch, > HBASE-15181-master-v4.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170027#comment-15170027 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 53s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 18s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 24m 59s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 9s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 129m 21s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 41s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 263m 37s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[lines 236-239] | | JDK v1.8.0_72 Timed out junit tests | org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient | | JDK v1.7.0_95 Timed out junit tests | org.apache.hadoop.hbase.mapreduce.TestRowCounter | | | org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles | | |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168754#comment-15168754 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 14s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 0s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 24m 24s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 19s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 14s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 226m 48s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[line 238] | | JDK v1.8.0_72 Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics | | JDK v1.7.0_95 Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.9.1 Server=1.9.1 Image:yetus/hbase:date2016-02-26 | | JIRA Patch URL |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168473#comment-15168473 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 42s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 8s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 4m 21s {color} | {color:red} Patch generated 1 new checkstyle issues in hbase-server (total was 83, now 79). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 30m 18s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 15s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 142m 41s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 110m 30s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 309m 37s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[line 239] | | JDK v1.8.0_72 Failed junit tests | hadoop.hbase.client.TestBlockEvictionFromClient | | | hadoop.hbase.regionserver.TestRegionServerMetrics | | | hadoop.hbase.quotas.TestQuotaThrottle | | JDK v1.8.0_72 Timed out junit tests |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168302#comment-15168302 ] Duo Zhang commented on HBASE-15181: --- +1. This feature is very important for our MiCloud service(https://i.mi.com/, a service like iCloud). And still need to add some new configurations to fit our requirements. Can open new issues when this getting in. Thanks. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, > HBASE-15181-master-v2.patch, HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully. Time range overlapping among > store files is tolerated and the performance impact is minimized. > Configuration can be set at hbase-site.xml or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168099#comment-15168099 ] Hadoop QA commented on HBASE-15181: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 14s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 6s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 26m 39s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 18s {color} | {color:red} hbase-server introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 107m 3s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 101m 2s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 257m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | storeFile must be non-null but is marked as nullable At DateTieredCompactionPolicy.java:is marked as nullable At DateTieredCompactionPolicy.java:[line 240] | | JDK v1.8.0_72 Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics | | | hadoop.hbase.replication.TestReplicationKillSlaveRS | | JDK v1.7.0_95 Failed junit tests | hadoop.hbase.regionserver.TestRegionServerMetrics | | |
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167663#comment-15167663 ] Enis Soztutar commented on HBASE-15181: --- I think this is ready to go in. Let's do a hadoopqa run. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, HBASE-15181-v1.patch, > HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167460#comment-15167460 ] Clara Xiong commented on HBASE-15181: - No, the change I made in the algorithm will handle those cases gracefully and perform as exploring compaction, as enis and I had discussed. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-master-v1.patch, HBASE-15181-v1.patch, > HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167430#comment-15167430 ] Ted Yu commented on HBASE-15181: bq. seqId and timestamp are in completely opposite orders Is there metric showing the above scenario so that user can switch back to exploring compaction ? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167422#comment-15167422 ] Clara Xiong commented on HBASE-15181: - [~tedyu] The change in the algorithm is very minor and doesn't have perf impact in our case. We have some out-of-order data due to replication lag mostly much smaller than flush interval, even smaller than the base window. We are pushing the new patch to production since we don't want to fork. We will continuously collect metrics but I don't expect any difference. I will share the results when they are ready. The most impacted cases are: 1. seqId and timestamp are in completely opposite orders, most likely resulted from business logic. 2. Bulk load files carry -1 as seqId when user explicitly turn off "hbase.mapreduce.bulkload.assign.sequenceNumbers". I don't recommend this compaction policy for these cases. As the worst case scenarios, they will fall back to exploring compaction. [~vrodionov] Time-series data that are loaded periodically with minimal time range overlap will perform perfectly in this case with base window set to cover the interval. Some users may have occasional bulkload data that could be out of proportion of the files on the same tiers and they will need to pay some scan performance penalty. As time passes, they move to higher tier, the penalty will diminish. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163166#comment-15163166 ] Ted Yu commented on HBASE-15181: Since latest patch has revised algorithm significantly, a new round of performance verification on cluster is desirable. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160244#comment-15160244 ] Anoop Sam John commented on HBASE-15181: Exclusion of large files wont exclude in btw files. We see a large file we break there. {code} private ArrayList skipLargeFiles(ArrayList candidates, boolean mayUseOffpeak) { int pos = 0; while (pos < candidates.size() && !candidates.get(pos).isReference() && (candidates.get(pos).getReader().length() > comConf.getMaxCompactSize(mayUseOffpeak))) { ++pos; } if (pos > 0) { LOG.debug("Some files are too large. Excluding " + pos + " files from compaction candidates"); candidates.subList(0, pos).clear(); } return candidates; } {code} Bulk load exclude excludes in btw bulk loaded files. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160223#comment-15160223 ] Clara Xiong commented on HBASE-15181: - [~enis] Thank you for clarification. But my question was: right now, the default compaction policy may exclude bulkload file and large files by configuration. If a user turns either one on, some files will be excluded from compaction selection. That may result in compaction of non-contiguous files sorted by seqId. Will that not cause any MVCC issue? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160139#comment-15160139 ] Enis Soztutar commented on HBASE-15181: --- bq. But I want to add a tweak to this proposal on how to handle late-arriving data. I want to compact the out-of-order data with newer files other than older ones. We are using the maxTs to select the tier, rather than minTs, so this is already true? bq. We have observed drastic IO reduction for the scans. Great. This will be a nice addition to HBase. Let me look at the patch. bq. I am wondering how we have maintained mvcc with Ratio-basedCompactionPolicy and its derived class ExploringCompactonPolicy when we allow filtering bulk-load and skip large files? We are assigning (bulk load) a seqId to the bulk loaded files at the time of the bulk load. We execute a flush beforehand to make sure that the sequenceId that is assigned is not overlapping with the in-memory data's sequenceIds. We have a store-level read/write lock that coordinates bulk load files and file selection for compaction. Is this what you were asking for? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160057#comment-15160057 ] Ted Yu commented on HBASE-15181: Clara: Mind attaching your patch here for QA run ? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160026#comment-15160026 ] Clara Xiong commented on HBASE-15181: - I updated the design spec and uploaded the new patch at https://reviews.apache.org/r/43114/. Major changes: 1. sort by seq id. A new test was added. 2. handling future data. A new test was added. 3. other CR feedbacks > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158362#comment-15158362 ] Clara Xiong commented on HBASE-15181: - [~enis]I am wondering how we have maintained mvcc with Ratio-basedCompactionPolicy and its derived class ExploringCompactonPolicy when we allow filtering bulk-load and skip large files? I followed them to make the behavior consistent. But I wonder whether that will allow non-contiguous compaction. By default, they are turned off. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153385#comment-15153385 ] Clara Xiong commented on HBASE-15181: - [~enis][~vrodionov] I appreciate all the callouts and suggestions greatly. I agree we have to guarantee correctness and I think the first proposal will carry the smaller trade-off. We want to construct the tiers to maximize performance for time-range scan on time-series data.The scan api are typically based on look back window based on data timestamps. So we want to use timestamp instead of creation time to align the tiers to the scans. And creation time may not be as reliable as sequence id as a monochronic indicator. [~vrodionov] I carefully thought about Enis' first proposal. It should work since we know which tier and compaction window a store file belongs to as long as we know the current time and the file's maxTimestamp. We don't need the sequenceId to build the tiers. But I want to add a tweak to this proposal on how to handle late-arriving data. I want to compact the out-of-order data with newer files other than older ones. Since we don’t write future data, the worst scenario is that file on the lower tier have long tails instead of the data goes to higher tier. The additional cost of long tail is the cost to scan newer and smaller files. Given the tiered design, we only need to scan additional data at most the tail size + current window size.This will also reduce the chance of recompacting small files to an out-of -portion file. For the bulk load scenarios, currently bulk-load file carries 0 as sequenceId which will land them at the highest tier. It is configurable to use the sequenceId at the time of creation which will land them at the lower tiers. We will need to call out that user wants to decide based on the data timestamps relatively to the tiers and the access pattern. Please let me know what you think. [~enis]We do have performance results from production on a very large cluster replicated across multiple DC serving many concurrent time-range scans of different look-back windows. We are collecting more. I will share them externally once they are ready, most likely next week. We have observed drastic IO reduction for the scans. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151574#comment-15151574 ] Vladimir Rodionov commented on HBASE-15181: --- One major use case still requires additional attention with OLDEST_CREATION_TS approach: Initial bulk load of a large data set? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151558#comment-15151558 ] Enis Soztutar commented on HBASE-15181: --- bq. Then we can use this OLDEST_CREATION_TIME instead of maxTimestamp. We will use date ranges for OLDEST_CREATION_TIME - not for date timestamps, You are right. The CREATE_TIME_TS gets updated when the files are compacted. If we keep OLDEST_CREATION_TS similar to how we keep seqIds for the newly compacted files, and use OLDEST_CREATION_TS to select the tier, it should work. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151553#comment-15151553 ] Vladimir Rodionov commented on HBASE-15181: --- Sequence IDs can not be tiered. Yes? They are not time ranges I think we need to enhance HFile/compactions to keep the oldest HFile creation time (which is equals to creation time of a newly flashed store file or the oldest creation time of files being compacted). Then we can use this OLDEST_CREATION_TIME instead of maxTimestamp. We will use date ranges for OLDEST_CREATION_TIME - not for date timestamps There is one to one correlation between Seq Id and OLDEST_CREATION_TIME: the ordering by OLDEST_CREATION_TIME and by sequence ID is the same. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151531#comment-15151531 ] Enis Soztutar commented on HBASE-15181: --- [~claraxiong] this is great work BTW. Thanks for pushing for this. I just wanted to bring one open item back to jira to see whether ordering files with timestamps, rather than seqid, and doing non-contiguous is acceptable: {quote} The tiered structure is built completely and solely on the data timestamp of the store files. We cannot sort by segId at all. Any logic for updates/deletes depending on seqId would break. The user needs to guarantee updates or deletes are in order aligned with time stamp order. This compaction policy is pluggable and this limitation will be lifted if the work to allow compaction out of order of seqId is done. As you pointed out in the ticket: "What I was saying offline is that we can actually do something like HBASE-9905 and disallow client-settable timestamps, or do something like HBASE-10247 where the table pre-declares that we won't have same-ts edits, it should be possible to do non-contigous compactions." {quote} Given that there is no hard-guarantees as of now about whether the client can do out of order timestamp writes, can we still always be correct, but if the client does an excessive amount of these writes, the compaction will not perform as efficiently. Basically, if we can, I would like a system where the client will get the full benefit automatically if the timestamps follow seqId order, but if not, the results are still correct. If there are occasional out-of-order writes, the performance is not that badly affected, if not, the compaction algorithm can behave badly. I think we can achieve this with something like this: - Use max ts as in the design for store files. - Instead of ordering files by decreasing ts, order files by decreasing seqId. - Iterating from highest seqId to lowest, find the tier that the file belongs to using maxTs. The only difference from the current algorithm is that in the iteration, we should always assign tiers in increasing order t0, t1, t2. This means that if out of order data is present, and we end up with flushes where maxTs is very old, lets say it falls into t2, then t1 and t0 would be empty and all files will be t2+. Otherwise (if you do not have out of order writes, or have them occasionally) the behavior will be the same as in the design. Alternatively HFiles also have CREATE_TIME_TS, which is different than maxTimestamp. maxTS comes from the user data, while hfile create time is the system time at the time of hfile writing. If we do the tier selection based on hfile time instead of users maxTs, then we might not even have that problem. Again, if there is actual correlation of user's timestamps with the seqIds (or hfile create times), you would get all the benefits, otherwise, we would still return the correct results, but compaction may not be optimal (I think it will be like falling back to exploring one). Anyway, just a suggestion to consider. I might not have thought of all corner cases. You are saying that this patch is also in production. Are there any numbers you've collected? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0, 1.3.0, 0.98.19 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150900#comment-15150900 ] Clara Xiong commented on HBASE-15181: - Thank you. I have incorporated and will upload a new patch shortly once some of the open question on RB are resolved. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150705#comment-15150705 ] Clara Xiong commented on HBASE-15181: - Sorry I meant I read your comments and the HStore codes. You are right. [~enis] > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150019#comment-15150019 ] Clara Xiong commented on HBASE-15181: - Read the comments and code. It is not needed. Will remove that change. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149793#comment-15149793 ] Enis Soztutar commented on HBASE-15181: --- bq. My estimate is it will take significant expansion to make it work with time boundaries and it would be less effort and cleaner if I create a new compactor Alright, we do not want to duplicate code and work that's why we were suggesting to consider that. I did not take a look at it closely to say what would be involved, so we'll defer to your judgement. bq. I can break this patch up to two patches: one for dynamic configuration per column family and the other for the pluggable DateTieredCompactionPolicy. Are you talking about CompoundConfiguration? Did you see my comments above regarding that? If the CompoundConfiguration changes are needed, agreed that, it can go in separately. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139645#comment-15139645 ] Clara Xiong commented on HBASE-15181: - Two updates: 1. Thanks to [~stack] and [~enis] I went through the StripeCompactor code to see whether I can leverage the code for multiple output to chop the files by tier boundaries. My estimate is it will take significant expansion to make it work with time boundaries and it would be less effort and cleaner if I create a new compactor. 2. Thanks to [~vrodionov] [~stack] and [~enis] After the discussion on the reason we need to compact contiguously, I realized we have a hole in the algorithm. I sort the files by client defined max time stamp, not sequence id. Although the algorithm still only select contiguous store files , they are not contiguous on seq id. The new changes for seq id will make it work. I can break this patch up to two patches: one for dynamic configuration per column family and the other for the pluggable DateTieredCompactionPolicy. Please let me know what you think. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133258#comment-15133258 ] Ted Yu commented on HBASE-15181: Please incorporate the above and submit new patch for QA run. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133182#comment-15133182 ] Clara Xiong commented on HBASE-15181: - Yes, that serves the same purpose with the properly added check. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131670#comment-15131670 ] Ted Yu commented on HBASE-15181: w.r.t. the nested loop in getBuckets(), do you think the following is more readable ? {code} while (it.hasNext()) { if (!target.onTarget(it.peek().getSecond())) { // If the file is too new for the target, skip it. if (target.compareToTimestamp(it.peek().getSecond()) < 0) { it.next(); } else { // If the file is too old for the target, switch to higher // tier. target = target.nextTarget(tierBase); } } ArrayList bucket = Lists.newArrayList(); // Add all files of the same tier to current bucket while (it.hasNext() && target.onTarget(it.peek().getSecond())) { bucket.add(it.next().getFirst()); } if (!bucket.isEmpty()) { buckets.add(bucket); } } {code} Basically there is no need for the label. I ran TestTieredCompaction and it passed. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129144#comment-15129144 ] Clara Xiong commented on HBASE-15181: - Uploaded to RB at https://reviews.apache.org/r/43114/. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129366#comment-15129366 ] Clara Xiong commented on HBASE-15181: - Thank you for your input. I will update the spec. We have date tiering like the spotify blog but with tweaks. The overall theme is tiered store files based on the max time stamp for their data. If new data comes in perfect order, there is only one file per window except the incoming window or when they move across tiers. If there is out-of-order data/late arrival/bulk loaded data, they will fall into older windows for compaction selection and may trigger compactions. We allow user to plug in a compaction policy for this case, default at exploring compaction, to reduce compaction storms. Other policy can be plugged in if user want to keep file count small by paying the price of re-compaction. We use max time stamp instead of min as in the spotify blog to reduce performance penalty for out-of-order data, assuming no timestamp will be set to future time. TTL works out of box by skipping the whole files. Major compaction is disabled except pushed manually. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129372#comment-15129372 ] Clara Xiong commented on HBASE-15181: - With this implementation, we have an overall compaction policy and a per-window compaction policy which is default to exploring policy. So I have to use two separate configuration. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127037#comment-15127037 ] Vladimir Rodionov commented on HBASE-15181: --- Looking into the code: Why is this method generic and static? {code} static ListgetBuckets(Collection > files, long timeUnit, int tierBase, long now) { } {code} > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127052#comment-15127052 ] Clara Xiong commented on HBASE-15181: - Not really. The algorithm depends on the max time stamp. If the bulk load file has a time span significantly bigger than the window, the scan performance will suffer by the extra data being scanned. But once the files are moved to larger windows on higher tiers, the penalty decreases and finally disappear. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126893#comment-15126893 ] Vladimir Rodionov commented on HBASE-15181: --- Thanks for the patch, [~claraxiong] You still rely on default major compaction for bulk loaded files, periodic major compactions are disabled, therefore the only way to compact bulk loaded files is to force major compaction manually. For many applications the new compaction policy won't give much benefit - they periodically do batch load and they will have to run major compaction after on a daily basis. Have you thought about that? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126906#comment-15126906 ] Clara Xiong commented on HBASE-15181: - [~vrodionov]I am thinking about a multiple-output solution for major compaction so the output will be laid out perfectly. It can be easily plugged into this solution. If you have anything that works, could you share? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126952#comment-15126952 ] Dave Latham commented on HBASE-15181: - Definitely seems to depend on the pattern of loads. If most bulk loads are for data for a limited time range then the files will naturally move into appropriate windows, and scans for recent data should include or skip them as appropriate. If the bulk loads cover data over the full history then scans for recent data will end up including them and touching the old data they have as well until the file ages, so some sort of re-compacting (such as doing a major compaction splitting the data into windowed output files) would seem to help. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127096#comment-15127096 ] Vladimir Rodionov commented on HBASE-15181: --- {quote} Can you elaborate on why bulk loaded files on a limited time range require major compaction {quote} According to RatioBasedCompactionPolicy selectCompaction: {code} if (!isTryingMajor && !isAfterSplit) { // We're are not compacting all files, let's see what files are applicable candidateSelection = filterBulk(candidateSelection); candidateSelection = applyCompactionPolicy(candidateSelection, mayUseOffPeak, mayBeStuck); candidateSelection = checkMinFilesCriteria(candidateSelection); } {code} we always filter bulk loaded files out in minor compactions. The reason why is not clear to me. May be some our gurus can answer this question? [~enis], [~saint@gmail.com], [~apurtell]? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127112#comment-15127112 ] Vladimir Rodionov commented on HBASE-15181: --- [~claraxiong], why is it TieredCompactionPolicy and not DateTieredCompactionPolicy? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127166#comment-15127166 ] Clara Xiong commented on HBASE-15181: - Whether to skip bulk load file for minor compaction is configurable by "hbase.mapreduce.hfileoutputformat.compaction.exclude". This was put in for the following issues. https://issues.apache.org/jira/browse/HBASE-3690 https://issues.apache.org/jira/browse/HBASE-3404 By default it is off for the existing compaction policies to avoid compaction storms. We may want to recommend people to turn it on for date-based tiered compaction. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127134#comment-15127134 ] Vladimir Rodionov commented on HBASE-15181: --- The original idea of DateTieredCompactionPolicy (as in HBASE-14477) was to improve read of most recent data and to reduce overall compaction-related IO. The proposed simple implementation will meet these requirements only for applications w/o periodic data bulk loading and for mostly in-order data streams (that is probably use case at Yahoo?) Periodic data bulk loading and significant out -of -order data streams reduces the value of this implementation significantly. Before we can move on with TCP/DTCP we should figure out how to solve these problems, may be in a separate JIRA, as since handling bulk loaded data is not TCP - specific but generic approach. Two questions: # Why is bulk loaded data excluded from minor compaction # Why we can not select non-contiguous range of store files for compaction? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127195#comment-15127195 ] Vladimir Rodionov commented on HBASE-15181: --- Yes, you are right. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127284#comment-15127284 ] Dave Latham commented on HBASE-15181: - Those are great questions, Vladimir - I hope people who know better chime in and confirm / deny. Poking at the issues that Clara linked, I can speculate: The default compaction algorithm at the time (Ratio based?) was intended to handle stores with a shape where the older files were larger than recent files, so it sounds the policy would not intelligently handle a smaller bulk load file that is sorted to be oldest and end up doing a large wasteful compaction. For contiguous-only compactions, I think that's because the sequence IDs are only stored per-file, not per-cell. So if you want to compare two Cell sequence IDs for identical keys, then you need to have a strict ordering on HFile sequence IDs. If you compact out-of-order HFiles, then you don't have strictly ordered sequence IDs any more. Both speculation. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127087#comment-15127087 ] Dave Latham commented on HBASE-15181: - Thanks for the feedback and review, Vladimir. Can you elaborate on why bulk loaded files on a limited time range require major compaction? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127088#comment-15127088 ] Vladimir Rodionov commented on HBASE-15181: --- {quote} Not really. {quote} This is not TieredCompactionPolicy - specific issue per se, but ... let us take a look at the following scenario (one of our customers, by the way): We bulk load data periodically once a day and that is it. Every time new store file is created (per region/cf) which is not eligible for minor compaction. The only way to compact them is to force major compaction periodically. To compact new 10 files (for the last 10 days) you will need to compact all region (may be months/years of data). As I pointed out this is not TCP - specific issue - it exists for all other SizeTiered -based compaction policies. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127398#comment-15127398 ] Enis Soztutar commented on HBASE-15181: --- In HFileOutputFormat2 we are doing this: {code} final boolean compactionExclude = conf.getBoolean( "hbase.mapreduce.hfileoutputformat.compaction.exclude", false); ... w.appendFileInfo(StoreFile.EXCLUDE_FROM_MINOR_COMPACTION_KEY, Bytes.toBytes(compactionExclude)); {code} So, it seems that we do allow bulk load files to be minor-compacted by default. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127426#comment-15127426 ] stack commented on HBASE-15181: --- Good stuff [~vrodionov] You too [~claraxiong]. Suggest that a sentence on bulk load concern make it into release notes. Nice writeup... What is the 'Sliding Window Tiered Compaction'? I don't know what it is so I don't know why "...sliding windows may trigger compaction too frequently and cause some files to be re-compacted." Is it the "...fully mimic the behavior of STCS and compact SSTables with a relative age difference less than a constant factor. " from the cited spotiify document? I see no engagement with stripe compactions in the write up. Were they considered at all (stripe purportedly does best when the data is timeseries shaped). Would be good to at least call out how this differs. Suggest you give more direct credit to https://labs.spotify.com/2014/12/18/date-tiered-compaction/ . You do so for the image copied but I find I have to read the original to understand what is being proposed and it seems like a bunch of the notions and text comes from it. bq. And at the time of recovery, we may need to bulk load data. Which recovery is this? And who is doing the bulk load? Do major compactions run in the date tiered scheme? As per [~vrodionov], need the 'date' qualifier on configuration names... So, we have date tiering like the spotify blog compacting in first tier if above configured threshold. For other tiers, we do default exploring compactions. If bulk load, it can ruin our tiering but we'll just drop it in the tier that has its oldest timestamp? Major compactions does all tiers but the newest? (And when dated tiered, should be easier to drop whole files if TTL?) Let me look at the patch. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127380#comment-15127380 ] Enis Soztutar commented on HBASE-15181: --- HBASE-7763 is the jira that talks about why we need to select contiguous set of files for compaction. The main idea is that if two puts happen with the same timestamp, we are ordering them using the sequenceId so that the "latest" one is returned always. This allows the user to override a previously set value for example in some cases. The problem with non-contiguous compactions is that, we do not keep the seqids of cells forever. After some time, we remove per-cell seqIds and only keep 1 sequenceId per hfile. Thus if we end up with two different puts having different seqIds in files, but with same timestamp, then allowing non-contiguous compactions may break the ordering. For example: {code} file1: seqId=10, row=foo, val=v1 ts = 100 file2: seqId=20, row=bar, val=v2, ts=200 file3: seqId=30, row=foo, val=v3, ts = 100 file4: seqId=40, row=bar, val=v4, ts=300 {code} If I compact file1 and file4 together, then the new file will have
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127314#comment-15127314 ] Clara Xiong commented on HBASE-15181: - As to the second question: my implementation sorts store files by max timestamp. It is desirable not to compact non-contiguous range of store files to maximize the scan performance by reducing time range overlap. As to the more generic question about out-of-order data streams, they are assigned to the compaction windows based on their max time stamps, not sequence id. We have to pay some scan performance penalty only if the flush file contains much wider time range compared to the compaction windows. A generic solution for this would be splitting off data out of compaction window using multiple output for compactor. As stated in the design spec, we don't see enough benefit for this solution, yet. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127491#comment-15127491 ] stack commented on HBASE-15181: --- Sounds right to me [~davelatham] Ok keep the sequenceid around always [~enis].. .then we could do non-contiguous compactions and respect order in which edits were added. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126993#comment-15126993 ] Vladimir Rodionov commented on HBASE-15181: --- {quote} If most bulk loads are for data for a limited time range then the files will naturally move into appropriate windows, and scans for recent data should include or skip them as appropriate. {quote} Still requires major compaction (of all files in a region). > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127610#comment-15127610 ] Enis Soztutar commented on HBASE-15181: --- bq. Ok keep the sequenceid around always Enis Soztutar.. .then we could do non-contiguous compactions and respect order in which edits were added. Yes. I think the mvcc unification made this 2 weeks, then Lars made it to be 1 day or so because of extra overhead. Bad thing is that we do not want to keep 2 8-byte objects (ts and seqId) per cell. Time to unify seqId with ts. Is this needed? We already do the CompoundConfig in HStore from {{family.getConfiguration()}}. {code} +this.conf = new CompoundConfiguration().add(conf) +.addBytesMap(storeConfigInfo.getHColumnDescriptor().getValues()); {code} Can you also put up the patch at RB. it would be easier that way. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127619#comment-15127619 ] Enis Soztutar commented on HBASE-15181: --- Can we re-use the already existing min threshold rather than introduce {{hbase.hstore.compaction.tiered.min.threshold}}? The semantics would be applied the same. bq. I see no engagement with stripe compactions in the write up. Were they considered at all (stripe purportedly does best when the data is timeseries shaped). Would be good to at least call out how this differs. This is a very good point. Is there a way we can override how stripes are done (instead of row-range based stripes, we have tiered ranges) and have it share the same code? Maybe a pipe dream. cc [~sershe]. We are actually doing multiple-output-files in stripe compaction policy for compactions and for L0 flushes. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122082#comment-15122082 ] Ted Yu commented on HBASE-15181: Is this in production ? If so, can you share performance numbers ? 75public static final String MAX_AGE = CONFIG_PREFIX + "tiered.max.storefile.age"; 76public static final String TIME_UNIT = CONFIG_PREFIX + "tiered.time.unit"; 77public static final String TIER_BASE = CONFIG_PREFIX + "tiered.tier.base"; 78public static final String MIN_THRESHOLD = CONFIG_PREFIX + "tiered.min.threshold"; Please add javadoc for the parameters above. Normally such constants end with '_KEY' TieredCompactionPolicy.java needs Apache license. Please add annotation for audience and class javadoc. Putting the next patch on review board would facilitate reviewing. > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
[ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122174#comment-15122174 ] Ted Yu commented on HBASE-15181: I went over the linked doc. bq. For other tiers, we apply the exploring compaction using a small file count. Can you give a bit more detail on this small file count ? Does it appear in the tables at the end of the doc ? How do you handle bulk loaded hfiles (in terms of maintaining window boundaries) ? > A simple implementation of date based tiered compaction > --- > > Key: HBASE-15181 > URL: https://issues.apache.org/jira/browse/HBASE-15181 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Clara Xiong >Assignee: Clara Xiong > Fix For: 2.0.0 > > Attachments: HBASE-15181-v1.patch > > > This is a simple implementation of date-based tiered compaction similar to > Cassandra's for the following benefits: > 1. Improve date-range-based scan by structuring store files in date-based > tiered layout. > 2. Reduce compaction overhead. > 3. Improve TTL efficiency. > Perfect fit for the use cases that: > 1. has mostly date-based date write and scan and a focus on the most recent > data. > 2. never or rarely deletes data. > Out-of-order writes are handled gracefully so the data will still get to the > right store file for time-range-scan and re-compacton with existing store > file in the same time window is handled by ExploringCompactionPolicy. > Time range overlapping among store files is tolerated and the performance > impact is minimized. > Configuration can be set at hbase-site or overriden at per-table or > per-column-famly level by hbase shell. > Design spec is at > https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)