[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-13031: Fix Version/s: (was: 0.98.18) (was: 1.3.0) (was: 2.0.0) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.17) 0.98.18 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.18 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.16) 0.98.17 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.17 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.15) 0.98.16 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.16 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.14) (was: 0.94.26) 0.98.15 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.3.0, 0.98.15 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13031: Fix Version/s: (was: 1.2.0) 1.3.0 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 0.98.14, 1.3.0 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Status: Open (was: Patch Available) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.98.14, 1.2.0, 0.94.26 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.13) 0.98.14 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 0.98.14, 1.2.0 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-13031: - Fix Version/s: (was: 1.1.0) 1.2.0 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 0.98.13, 1.2.0 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.12) 0.98.13 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.13 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13031: --- Fix Version/s: (was: 0.98.11) 0.98.12 Moving to 0.98.12 > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.12 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Status: Open (was: Patch Available) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Attachment: HBASE-13031-v1.patch > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Attachment: (was: HBASE-13031-v1.patch) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > Attachments: HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Status: Patch Available (was: Open) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Status: Open (was: Patch Available) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Status: Patch Available (was: Open) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Attachment: HBASE-13031-v1.patch > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > Attachments: HBASE-13031-v1.patch, HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Status: Patch Available (was: Open) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26 > > Attachments: HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-13031: --- Attachment: HBASE-13031.patch Attached the trunk patch for ability to snapshot key ranges. If folks are interested in getting this upstream, I can provide backports for 1.x, 98.x and 94. > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > Attachments: HBASE-13031.patch > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range
[ https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Latham updated HBASE-13031: Priority: Major (was: Critical) Affects Version/s: (was: 0.98.11) (was: 1.1.0) (was: 0.94.26) (was: 2.0.0) Fix Version/s: 0.94.26 0.98.11 1.1.0 2.0.0 Issue Type: Improvement (was: Brainstorming) > Ability to snapshot based on a key range > > > Key: HBASE-13031 > URL: https://issues.apache.org/jira/browse/HBASE-13031 > Project: HBase > Issue Type: Improvement >Reporter: churro morales >Assignee: churro morales > Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11 > > > Posted on the mailing list and seems like some people are interested. A > little background for everyone. > We have a very large table, we would like to snapshot and transfer the data > to another cluster (compressed data is always better to ship). Our problem > lies in the fact it could take many weeks to transfer all of the data and > during that time with major compactions, the data stored in dfs has the > potential to double which would cause us to run out of disk space. > So we were thinking about allowing the ability to snapshot a specific key > range. > Ideally I feel the approach is that the user would specify a start and stop > key, those would be associated with a region boundary. If between the time > the user submits the request and the snapshot is taken the boundaries change > (due to merging or splitting of regions) the snapshot should fail. > We would know which regions to snapshot and if those changed between when the > request was submitted and the regions locked, the snapshot could simply fail > and the user would try again, instead of potentially giving the user more / > less than what they had anticipated. I was planning on storing the start / > stop key in the SnapshotDescription and from there it looks pretty straight > forward where we just have to change the verifier code to accommodate the key > ranges. > If this design sounds good to anyone, or if I am overlooking anything please > let me know. Once we agree on the design, I'll write and submit the patches. -- This message was sent by Atlassian JIRA (v6.3.4#6332)