[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

2016-02-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173181#comment-15173181
 ] 

Duo Zhang commented on HBASE-15339:
---

For example, usually we consider hot data to be 'data that written in the last 
7 days', not 'data that written after last Monday', that's why a moving window 
is more suitable to determine hot data.

And for the archive logic, there is a max age config which we will skip 
compaction on files older than this. I think we could do archive on these files?

Thanks.

> Add archive tiers for date based tiered compaction
> --
>
> Key: HBASE-15339
> URL: https://issues.apache.org/jira/browse/HBASE-15339
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to 
> keep it, so we want to put the data on inexpensive device and reduce 
> redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old 
> data can be placed in different tier. But the tier boundary moves as time 
> lapse so it is still possible that we do compaction on old tier which breaks 
> our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. 
> Add an configuration called "archive unit", for example, year. That means, if 
> we find that the tier boundary is already in the previous year, then we reset 
> the boundary to the start of year and end of year, and if we want to do 
> compaction in this tier, just compact all files into one file. The file will 
> never be changed unless we force a major compaction so it is safe to apply EC 
> and other cost reducing approach on the file. And we make more tiers before 
> this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

2016-02-29 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172173#comment-15172173
 ] 

Dave Latham commented on HBASE-15339:
-

Why does MiCloud need a moving window?  The idea if using different tiers is 
that recent windows are in the lower tier so that if recent data is "hot" then 
it is well partitioned and efficiently read.

Other Windowing strategies do sound interesting though, please share what you 
come up with.

> Add archive tiers for date based tiered compaction
> --
>
> Key: HBASE-15339
> URL: https://issues.apache.org/jira/browse/HBASE-15339
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to 
> keep it, so we want to put the data on inexpensive device and reduce 
> redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old 
> data can be placed in different tier. But the tier boundary moves as time 
> lapse so it is still possible that we do compaction on old tier which breaks 
> our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. 
> Add an configuration called "archive unit", for example, year. That means, if 
> we find that the tier boundary is already in the previous year, then we reset 
> the boundary to the start of year and end of year, and if we want to do 
> compaction in this tier, just compact all files into one file. The file will 
> never be changed unless we force a major compaction so it is safe to apply EC 
> and other cost reducing approach on the file. And we make more tiers before 
> this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

2016-02-27 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170916#comment-15170916
 ] 

Duo Zhang commented on HBASE-15339:
---

OK, I went through the patch. The window generating algorithm is 
interesting(unix time divided by window size).
But in fact, MiCloud need a moving window to determine hot data, and a fixed 
windows to archive old data(better by year).
Luckily we have a {{Window}} class here. I think we can make Window an 
interface and give it several different implementations.

Will be back later when I find a way to integrate my logic into the compaction 
policy. Thanks.

> Add archive tiers for date based tiered compaction
> --
>
> Key: HBASE-15339
> URL: https://issues.apache.org/jira/browse/HBASE-15339
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to 
> keep it, so we want to put the data on inexpensive device and reduce 
> redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old 
> data can be placed in different tier. But the tier boundary moves as time 
> lapse so it is still possible that we do compaction on old tier which breaks 
> our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. 
> Add an configuration called "archive unit", for example, year. That means, if 
> we find that the tier boundary is already in the previous year, then we reset 
> the boundary to the start of year and end of year, and if we want to do 
> compaction in this tier, just compact all files into one file. The file will 
> never be changed unless we force a major compaction so it is safe to apply EC 
> and other cost reducing approach on the file. And we make more tiers before 
> this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

2016-02-26 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170083#comment-15170083
 ] 

Duo Zhang commented on HBASE-15339:
---

Oh, I see the picture in the design doc, it is not the same with Cassandra's 
DTCS. Fixed windows is enough although aligned by calendar looks better and has 
some benefits on statistic. And for our needs, we have replication, so we need 
to make sure the boundary is same on all the replication connected clusters. 
This makes it easier to verify the data consistency between these clusters.

Thanks [~davelatham], helps a lot. Let me learn the patch carefully and change 
the description here. And for the config, for our service we only need to have 
calendar aligned boundaries in the max tier. we do not need to archive data by 
days, then merge days to week, then month, quarter and year. We still have a 
hot data tier which is relative to now(typically, one week).

> Add archive tiers for date based tiered compaction
> --
>
> Key: HBASE-15339
> URL: https://issues.apache.org/jira/browse/HBASE-15339
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to 
> keep it, so we want to put the data on inexpensive device and reduce 
> redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old 
> data can be placed in different tier. But the tier boundary moves as time 
> lapse so it is still possible that we do compaction on old tier which breaks 
> our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. 
> Add an configuration called "archive unit", for example, year. That means, if 
> we find that the tier boundary is already in the previous year, then we reset 
> the boundary to the start of year and end of year, and if we want to do 
> compaction in this tier, just compact all files into one file. The file will 
> never be changed unless we force a major compaction so it is safe to apply EC 
> and other cost reducing approach on the file. And we make more tiers before 
> this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

2016-02-26 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169220#comment-15169220
 ] 

Dave Latham commented on HBASE-15339:
-

Duo, I'd love to understand this a little better.  The tiered compaction in 
HBASE-15181 has a max tier, so once data reaches that tier it never need be 
compacted again unless you force a major compaction.  The windows in that tier 
are fixed, based on epoch time, and their boundaries won't move.  They are not, 
however, aligned with the calendar, so if that is what you need, then you 
definitely need an enhancement.  I could imagine a config to use 
days/weeks/months/quarters/years for example instead of the simple epoch 
exponential tier schedule of HBASE-15181.  Can you elaborate on your needs and 
your proposal?

> Add archive tiers for date based tiered compaction
> --
>
> Key: HBASE-15339
> URL: https://issues.apache.org/jira/browse/HBASE-15339
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to 
> keep it, so we want to put the data on inexpensive device and reduce 
> redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old 
> data can be placed in different tier. But the tier boundary moves as time 
> lapse so it is still possible that we do compaction on old tier which breaks 
> our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. 
> Add an configuration called "archive unit", for example, year. That means, if 
> we find that the tier boundary is already in the previous year, then we reset 
> the boundary to the start of year and end of year, and if we want to do 
> compaction in this tier, just compact all files into one file. The file will 
> never be changed unless we force a major compaction so it is safe to apply EC 
> and other cost reducing approach on the file. And we make more tiers before 
> this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)