[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2016-06-20 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-13031:

Fix Version/s: (was: 0.98.18)
   (was: 1.3.0)
   (was: 2.0.0)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2016-01-07 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.17)
   0.98.18

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.18
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-10-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.16)
   0.98.17

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-09-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.15)
   0.98.16

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.16
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-08-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.14)
   (was: 0.94.26)
   0.98.15

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.15
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-06-22 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-13031:

Fix Version/s: (was: 1.2.0)
   1.3.0

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 0.98.14, 1.3.0
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-05-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Status: Open  (was: Patch Available)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.98.14, 1.2.0, 0.94.26
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-05-16 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.13)
   0.98.14

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 0.98.14, 1.2.0
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-04-27 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-13031:
-
Fix Version/s: (was: 1.1.0)
   1.2.0

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 0.98.13, 1.2.0
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.12)
   0.98.13

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.13
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13031:
---
Fix Version/s: (was: 0.98.11)
   0.98.12

Moving to 0.98.12

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.12
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Status: Open  (was: Patch Available)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Attachment: HBASE-13031-v1.patch

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Attachment: (was: HBASE-13031-v1.patch)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11
>
> Attachments: HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Status: Patch Available  (was: Open)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Status: Open  (was: Patch Available)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Status: Patch Available  (was: Open)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-03-02 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Attachment: HBASE-13031-v1.patch

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11
>
> Attachments: HBASE-13031-v1.patch, HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-02-26 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Status: Patch Available  (was: Open)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 1.1.0, 0.98.11, 0.94.26
>
> Attachments: HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-02-26 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-13031:
---
Attachment: HBASE-13031.patch

Attached the trunk patch for ability to snapshot key ranges.  If folks are 
interested in getting this upstream, I can provide backports for 1.x, 98.x and 
94.



> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11
>
> Attachments: HBASE-13031.patch
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13031) Ability to snapshot based on a key range

2015-02-12 Thread Dave Latham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Latham updated HBASE-13031:

 Priority: Major  (was: Critical)
Affects Version/s: (was: 0.98.11)
   (was: 1.1.0)
   (was: 0.94.26)
   (was: 2.0.0)
Fix Version/s: 0.94.26
   0.98.11
   1.1.0
   2.0.0
   Issue Type: Improvement  (was: Brainstorming)

> Ability to snapshot based on a key range
> 
>
> Key: HBASE-13031
> URL: https://issues.apache.org/jira/browse/HBASE-13031
> Project: HBase
>  Issue Type: Improvement
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 2.0.0, 0.94.26, 1.1.0, 0.98.11
>
>
> Posted on the mailing list and seems like some people are interested.  A 
> little background for everyone.
> We have a very large table, we would like to snapshot and transfer the data 
> to another cluster (compressed data is always better to ship).  Our problem 
> lies in the fact it could take many weeks to transfer all of the data and 
> during that time with major compactions, the data stored in dfs has the 
> potential to double which would cause us to run out of disk space.
> So we were thinking about allowing the ability to snapshot a specific key 
> range.  
> Ideally I feel the approach is that the user would specify a start and stop 
> key, those would be associated with a region boundary.  If between the time 
> the user submits the request and the snapshot is taken the boundaries change 
> (due to merging or splitting of regions) the snapshot should fail.
> We would know which regions to snapshot and if those changed between when the 
> request was submitted and the regions locked, the snapshot could simply fail 
> and the user would try again, instead of potentially giving the user more / 
> less than what they had anticipated.  I was planning on storing the start / 
> stop key in the SnapshotDescription and from there it looks pretty straight 
> forward where we just have to change the verifier code to accommodate the key 
> ranges.  
> If this design sounds good to anyone, or if I am overlooking anything please 
> let me know.  Once we agree on the design, I'll write and submit the patches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)