[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-05-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: stripe-cdf.pdf

On recent HBase meeting [~jmhsieh] asked me to provide an easier to understand 
chart of perf.
I haven't ran new experiments since then, and to set up new ones it will take 
some time (because I want to get good ones to use for con slides :)). For now 
attaching a primitive one I made out of old data, for reads using loadtesttool 
against default-compacted and stripe-compacted table. 500 data points for each.

The experiment setup is described in perf doc and is the one on c1.xlarge 
instances. Fixed 10-stripe scheme vs. default scheme was used, with 3 
relatively large (growing to several gigs) regions, with interleaving batches 
of writes and reads.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: stripe-cdf.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compaction perf evaluation.pdf, Stripe compaction perf evaluation.pdf, 
 Stripe compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, 
 Stripe compactions.pdf, Using stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Using stripe compactions.pdf

First draft of user-level doc. After trying to describe the size-based scheme, 
I think it should be improved. I will do that. Meanwhile there's design doc and 
user doc, so I'd like to get some reviews ;)
I will rebase and update all patches between now and monday. [~stack] 
[~mbertozzi] what do you guys think?

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Using stripe compactions.pdf

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: (was: Using stripe compactions.pdf)

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Using stripe compactions.pdf

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Using stripe 
 compactions.pdf, Using stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-04-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Stripe compaction perf evaluation.pdf

Updating the perf evaluation, I think I'm done with that for now. Looking for 
CRs :)
I will not have time next few days but I will get to noted optimizations (L0) 
after that

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-03-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Stripe compaction perf evaluation.pdf
Stripe compactions.pdf

Updating both docs. Size-based logic test result, as well as design improvement 
based on that.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
 perf evaluation.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
 compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-03-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Stripe compaction perf evaluation.pdf

perf doc... size test is not finished yet.

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compaction perf evaluation.pdf, Stripe 
 compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-03-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Attachment: Stripe compactions.pdf

updated doc

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: Stripe compactions.pdf, Stripe compactions.pdf


 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-02-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Component/s: Compaction

 Support stripe compaction
 -

 Key: HBASE-7667
 URL: https://issues.apache.org/jira/browse/HBASE-7667
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So I was thinking about having many regions as the way to make compactions 
 more manageable, and writing the level db doc about how level db range 
 overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
 Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
 factor.
 And I suggest the following idea, let's call it stripe compactions. It's a 
 mix between level db ideas and having many small regions.
 It allows us to have a subset of benefits of many regions (wrt reads and 
 compactions) without many of the drawbacks (managing and current 
 memstore/etc. limitation).
 It also doesn't break seqNum-based file sorting for any one key.
 It works like this.
 The region key space is separated into configurable number of fixed-boundary 
 stripes (determined the first time we stripe the data, see below).
 All the data from memstores is written to normal files with all keys present 
 (not striped), similar to L0 in LevelDb, or current files.
 Compaction policy does 3 types of compactions.
 First is L0 compaction, which takes all L0 files and breaks them down by 
 stripe. It may be optimized by adding more small files from different 
 stripes, but the main logical outcome is that there are no more L0 files and 
 all data is striped.
 Second is exactly similar to current compaction, but compacting one single 
 stripe. In future, nothing prevents us from applying compaction rules and 
 compacting part of the stripe (e.g. similar to current policy with rations 
 and stuff, tiers, whatever), but for the first cut I'd argue let it major 
 compact the entire stripe. Or just have the ratio and no more complexity.
 Finally, the third addresses the concern of the fixed boundaries causing 
 stripes to be very unbalanced.
 It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
 results out with different boundaries.
 There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
 will be smaller but rebalancing will take ridiculous amount of I/O.
 If we take many stripes we are essentially getting into the 
 epic-major-compaction problem again. Some heuristics will have to be in place.
 In general, if, before stripes are determined, we initially let L0 grow 
 before determining the stripes, we will get better boundaries.
 Also, unless unbalancing is really large we don't need to rebalance really.
 Obviously this scheme (as well as level) is not applicable for all scenarios, 
 e.g. if timestamp is your key it completely falls apart.
 The end result:
 - many small compactions that can be spread out in time.
 - reads still read from a small number of files (one stripe + L0).
 - region splits become marvelously simple (if we could move files between 
 regions, no references would be needed).
 Main advantage over Level (for HBase) is that default store can still open 
 the files and get correct results - there are no range overlap shenanigans.
 It also needs no metadata, although we may record some for convenience.
 It also would appear to not cause as much I/O.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Description: 
So I was thinking about having many regions as the way to make compactions more 
manageable, and writing the level db doc about how level db range overlap and 
data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and 
Ted, and thinking about how to avoid Level DB I/O multiplication factor.

And I suggest the following idea, let's call it stripe compactions. It's a mix 
between level db ideas and having many small regions.
It allows us to have a subset of benefits of many regions (wrt reads and 
compactions) without many of the drawbacks (managing and current memstore/etc. 
limitation).
It also doesn't break seqNum-based file sorting for any one key.
It works like this.
The region key space is separated into configurable number of fixed-boundary 
stripes (determined the first time we stripe the data, see below).
All the data from memstores is written to normal files with all keys present 
(not striped), similar to L0 in LevelDb, or current files.
Compaction policy does 3 types of compactions.
First is L0 compaction, which takes all L0 files and breaks them down by 
stripe. It may be optimized by adding more small files from different stripes, 
but the main logical outcome is that there are no more L0 files and all data is 
striped.
Second is exactly similar to current compaction, but compacting one single 
stripe. In future, nothing prevents us from applying compaction rules and 
compacting part of the stripe (e.g. similar to current policy with rations and 
stuff, tiers, whatever), but for the first cut I'd argue let it major compact 
the entire stripe. Or just have the ratio and no more complexity.
Finally, the third addresses the concern of the fixed boundaries causing 
stripes to be very unbalanced.
It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
results out with different boundaries.
There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
will be smaller but rebalancing will take ridiculous amount of I/O.
If we take many stripes we are essentially getting into the 
epic-major-compaction problem again. Some heuristics will have to be in place.
In general, if, before stripes are determined, we initially let L0 grow before 
determining the stripes, we will get better boundaries.
Also, unless unbalancing is really large we don't need to rebalance really.
Obviously this scheme (as well as level) is not applicable for all scenarios, 
e.g. if timestamp is your key it completely falls apart.

The end result:
- many small compactions that can be spread out in time.
- reads still read from a small number of files (one stripe + L0).
- region splits become marvelously simple (if we could move files between 
regions, no references would be needed).
Main advantage over Level (for HBase) is that default store can still open the 
files and get correct results - there are no range overlap shenanigans.
It also needs no metadata, although we may record some for convenience.


  was:
So I was thinking about having many regions as the way to make compactions more 
manageable, and writing the level db doc about how level db range overlap and 
data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and 
Ted, and thinking about how to avoid Level DB I/O multiplication factor.

And I suggest the following idea, let's call it stripe compactions. It's a mix 
between level db ideas and having many small regions.
It allows us to have a subset of benefits of many regions (wrt reads and 
compactions) without many of the drawbacks (managing and current memstore/etc. 
limitation).
It also doesn't break seqNum-based file sorting for any one key.
It works like this.
The key space is separated into configurable number of fixed-boundary stripes.
All the data from memstores is written to normal files with all keys present 
(not striped), similar to L0 in LevelDb, or current files.
Compaction policy does 3 types of compactions.
First is L0 compaction, which takes all L0 files and breaks them down by 
stripe. It may be optimized by adding more small files from different stripes, 
but the main logical outcome is that there are no more L0 files and all data is 
striped.
Second is exactly similar to current compaction, but compacting the entire 
stripe. In future, nothing prevents us from applying compaction rules and 
compacting part of the stripe (e.g. similar to current policy with rations and 
stuff, tiers, whatever), but for the first cut I'd argue let it major compact 
the entire stripe. Or just have the ratio and no more complexity.
Finally, the third addresses the concern of the fixed boundaries causing 
stripes to be very unbalanced.
It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
results out with 

[jira] [Updated] (HBASE-7667) Support stripe compaction

2013-01-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7667:


Description: 
So I was thinking about having many regions as the way to make compactions more 
manageable, and writing the level db doc about how level db range overlap and 
data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and 
Ted, and thinking about how to avoid Level DB I/O multiplication factor.

And I suggest the following idea, let's call it stripe compactions. It's a mix 
between level db ideas and having many small regions.
It allows us to have a subset of benefits of many regions (wrt reads and 
compactions) without many of the drawbacks (managing and current memstore/etc. 
limitation).
It also doesn't break seqNum-based file sorting for any one key.
It works like this.
The region key space is separated into configurable number of fixed-boundary 
stripes (determined the first time we stripe the data, see below).
All the data from memstores is written to normal files with all keys present 
(not striped), similar to L0 in LevelDb, or current files.
Compaction policy does 3 types of compactions.
First is L0 compaction, which takes all L0 files and breaks them down by 
stripe. It may be optimized by adding more small files from different stripes, 
but the main logical outcome is that there are no more L0 files and all data is 
striped.
Second is exactly similar to current compaction, but compacting one single 
stripe. In future, nothing prevents us from applying compaction rules and 
compacting part of the stripe (e.g. similar to current policy with rations and 
stuff, tiers, whatever), but for the first cut I'd argue let it major compact 
the entire stripe. Or just have the ratio and no more complexity.
Finally, the third addresses the concern of the fixed boundaries causing 
stripes to be very unbalanced.
It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
results out with different boundaries.
There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
will be smaller but rebalancing will take ridiculous amount of I/O.
If we take many stripes we are essentially getting into the 
epic-major-compaction problem again. Some heuristics will have to be in place.
In general, if, before stripes are determined, we initially let L0 grow before 
determining the stripes, we will get better boundaries.
Also, unless unbalancing is really large we don't need to rebalance really.
Obviously this scheme (as well as level) is not applicable for all scenarios, 
e.g. if timestamp is your key it completely falls apart.

The end result:
- many small compactions that can be spread out in time.
- reads still read from a small number of files (one stripe + L0).
- region splits become marvelously simple (if we could move files between 
regions, no references would be needed).
Main advantage over Level (for HBase) is that default store can still open the 
files and get correct results - there are no range overlap shenanigans.
It also needs no metadata, although we may record some for convenience.
It also would appear to not cause as much I/O.

  was:
So I was thinking about having many regions as the way to make compactions more 
manageable, and writing the level db doc about how level db range overlap and 
data mixing breaks seqNum sorting, and discussing it with Jimmy, Matteo and 
Ted, and thinking about how to avoid Level DB I/O multiplication factor.

And I suggest the following idea, let's call it stripe compactions. It's a mix 
between level db ideas and having many small regions.
It allows us to have a subset of benefits of many regions (wrt reads and 
compactions) without many of the drawbacks (managing and current memstore/etc. 
limitation).
It also doesn't break seqNum-based file sorting for any one key.
It works like this.
The region key space is separated into configurable number of fixed-boundary 
stripes (determined the first time we stripe the data, see below).
All the data from memstores is written to normal files with all keys present 
(not striped), similar to L0 in LevelDb, or current files.
Compaction policy does 3 types of compactions.
First is L0 compaction, which takes all L0 files and breaks them down by 
stripe. It may be optimized by adding more small files from different stripes, 
but the main logical outcome is that there are no more L0 files and all data is 
striped.
Second is exactly similar to current compaction, but compacting one single 
stripe. In future, nothing prevents us from applying compaction rules and 
compacting part of the stripe (e.g. similar to current policy with rations and 
stuff, tiers, whatever), but for the first cut I'd argue let it major compact 
the entire stripe. Or just have the ratio and no more complexity.
Finally, the third addresses the concern of the fixed boundaries causing 
stripes to be very