subject:"\[jira\] \[Updated\] \(HBASE\-21098\) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3"

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2019-02-01 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21098:
---
Fix Version/s: (was: 1.5.0)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
>  Labels: s3
> Fix For: 3.0.0, 2.2.0, 1.4.9
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2019-01-16 Thread Nihal Jain (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain updated HBASE-21098:
---
Labels: s3  (was: )

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
>  Labels: s3
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.9
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-12-13 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21098:
---
Fix Version/s: (was: 1.3.3)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.9
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-12-12 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21098:
---
Fix Version/s: (was: 1.4.10)
   1.4.9

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.9
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-12-12 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21098:
---
Fix Version/s: 1.4.10
   1.3.3

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.10
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-10-12 Thread Zach York (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-21098:
--
Fix Version/s: 1.5.0

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-10-12 Thread Zach York (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-21098:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Assignee: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-10-05 Thread Zach York (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-21098:
--
Attachment: HBASE-21098.branch-1.002.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.branch-1.002.patch, HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-10-04 Thread Zach York (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-21098:
--
Attachment: HBASE-21098.branch-1.001.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21098.branch-1.001.patch, 
> HBASE-21098.master.001.patch, HBASE-21098.master.002.patch, 
> HBASE-21098.master.003.patch, HBASE-21098.master.004.patch, 
> HBASE-21098.master.005.patch, HBASE-21098.master.006.patch, 
> HBASE-21098.master.007.patch, HBASE-21098.master.008.patch, 
> HBASE-21098.master.009.patch, HBASE-21098.master.010.patch, 
> HBASE-21098.master.011.patch, HBASE-21098.master.012.patch, 
> HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-13 Thread Zach York (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach York updated HBASE-21098:
--
Fix Version/s: 2.2.0
   3.0.0

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-07 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.013.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch, HBASE-21098.master.013.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-06 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.012.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch, 
> HBASE-21098.master.012.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-05 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.011.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch, HBASE-21098.master.011.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-04 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.010.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch, 
> HBASE-21098.master.010.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-09-04 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.009.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch, HBASE-21098.master.009.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-30 Thread Mingliang Liu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HBASE-21098:
--
Affects Version/s: 1.4.8

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.4.8, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-29 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.008.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch, 
> HBASE-21098.master.008.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-29 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.007.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch, HBASE-21098.master.007.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-27 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.006.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch, 
> HBASE-21098.master.006.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-27 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.005.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch, HBASE-21098.master.005.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-24 Thread Ted Yu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21098:
---
Release Note: It is recommended to place the working directory on-cluster 
on HDFS as doing so has shown a strong performance increase due to data 
locality. It is important to note that the working directory should not overlap 
with any existing directories as the working directory will be cleaned out 
during the snapshot process. Beyond that, any well-named directory on HDFS 
should be sufficient.  (was: I recommend storing the working directory 
on-cluster on HDFS as doing so has shown a strong performance increase due to 
data locality. It is important to note that the working directory should not 
overlap with any existing directories as the working directory will be cleaned 
out during the snapshot process. Beyond that, any well-named directory on HDFS 
should be sufficient.)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-24 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Release Note: I recommend storing the working directory on-cluster on HDFS 
as doing so has shown a strong performance increase due to data locality. It is 
important to note that the working directory should not overlap with any 
existing directories as the working directory will be cleaned out during the 
snapshot process. Beyond that, any well-named directory on HDFS should be 
sufficient.

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-24 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.004.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch, 
> HBASE-21098.master.004.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Description: 
When using Apache HBase, the snapshot feature can be used to make a point in 
time recovery. To do this, HBase creates a manifest of all the files in all of 
the Regions so that those files can be referenced again when a user restores a 
snapshot. With HBase's S3 storage mode, developers can store their data 
off-cluster on Amazon S3. However, utilizing S3 as a file system is inefficient 
in some operations, namely renames. Most Hadoop ecosystem applications use an 
atomic rename as a method of committing data. However, with S3, a rename is a 
separate copy and then a delete of every file which is no longer atomic and, in 
fact, quite costly. In addition, puts and deletes on S3 have latency issues 
that traditional filesystems do not encounter when manipulating the region 
snapshots to consolidate into a single manifest. When HBase on S3 users have a 
significant amount of regions, puts, deletes, and renames (the final commit 
stage of the snapshot) become the bottleneck causing snapshots to take many 
minutes or even hours to complete.

The purpose of this patch is to increase the overall performance of snapshots 
while utilizing HBase on S3 through the use of a temporary directory for the 
snapshots that exists on a traditional filesystem like HDFS to circumvent the 
bottlenecks.

  was:
When using Apache HBase, the snapshot feature can be used to make a point in 
time recovery. To do this, HBase creates a manifest of all the files in all of 
the Regions so that those files can be referenced again when a user restores a 
snapshot. With HBase storage mode S3, developers can store their data 
off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is inefficient 
in some operations, namely renames. Most Hadoop ecosystem applications use an 
atomic rename as a method of committing data. However, with S3, a rename is a 
separate copy and then a delete of every file which is no longer atomic and, in 
fact, quite costly. In addition, puts and deletes on S3 have latency issues 
that traditional filesystems do not encounter when manipulating the region 
snapshots. When HBase on S3 customers have a significant amount of regions, 
puts, deletes, and renames (the final commit stage of the snapshot) become the 
bottleneck causing snapshots to take many minutes or even hours to complete.

The purpose of this patch is to increase the overall performance of snapshots 
while utilizing HBase on S3 through the use of a temporary directory for the 
snapshots that exists on a traditional filesystem to circumvent the bottlenecks.


> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase's S3 storage mode, developers can store their 
> data off-cluster on Amazon S3. However, utilizing S3 as a file system is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots to consolidate into a single manifest. When 
> HBase on S3 users have a significant amount of regions, puts, deletes, and 
> renames (the final commit stage of the snapshot) become the bottleneck 
> causing snapshots to take many minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem like HDFS to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Affects Version/s: (was: 2.1.0)
   2.1.1
   3.0.0

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.1.1
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Status: Patch Available  (was: Open)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.003.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch, HBASE-21098.master.003.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.002.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch, 
> HBASE-21098.master.002.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Attachment: HBASE-21098.master.001.patch

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Tyler Mi
>Priority: Major
> Attachments: HBASE-21098.master.001.patch
>
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

2018-08-23 Thread Tyler Mi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Mi updated HBASE-21098:
-
Summary: Improve Snapshot Performance with Temporary Snapshot Directory 
when rootDir on S3  (was: HBase on S3 Snapshot Performance Increase)

> Improve Snapshot Performance with Temporary Snapshot Directory when rootDir 
> on S3
> -
>
> Key: HBASE-21098
> URL: https://issues.apache.org/jira/browse/HBASE-21098
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Tyler Mi
>Priority: Major
>
> When using Apache HBase, the snapshot feature can be used to make a point in 
> time recovery. To do this, HBase creates a manifest of all the files in all 
> of the Regions so that those files can be referenced again when a user 
> restores a snapshot. With HBase storage mode S3, developers can store their 
> data off-cluster in Amazon S3. However, utilizing S3 as a FileSystem is 
> inefficient in some operations, namely renames. Most Hadoop ecosystem 
> applications use an atomic rename as a method of committing data. However, 
> with S3, a rename is a separate copy and then a delete of every file which is 
> no longer atomic and, in fact, quite costly. In addition, puts and deletes on 
> S3 have latency issues that traditional filesystems do not encounter when 
> manipulating the region snapshots. When HBase on S3 customers have a 
> significant amount of regions, puts, deletes, and renames (the final commit 
> stage of the snapshot) become the bottleneck causing snapshots to take many 
> minutes or even hours to complete.
> The purpose of this patch is to increase the overall performance of snapshots 
> while utilizing HBase on S3 through the use of a temporary directory for the 
> snapshots that exists on a traditional filesystem to circumvent the 
> bottlenecks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

[jira] [Updated] (HBASE-21098) Improve Snapshot Performance with Temporary Snapshot Directory when rootDir on S3

30 matches

Site Navigation

Mail list logo

Footer information