[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286329#comment-13286329 ] stack commented on HBASE-50: Should we close out this issue in favor of hbase-6055 Jesse? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286340#comment-13286340 ] Jesse Yates commented on HBASE-50: -- @stack: +1 I don't see a point in keeping this open, given the codebase divergence. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13279324#comment-13279324 ] Jesse Yates commented on HBASE-50: -- After moving up the code to the trunk, realized that the implementation has bascially completely changed except for a couple small classes. As such, I opened HBASE-6055 to track the rest of my progress. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241515#comment-13241515 ] Zhihong Yu commented on HBASE-50: - @Jesse: When you create new review request on review board, leave Bugs field empty. Thanks Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240625#comment-13240625 ] Jesse Yates commented on HBASE-50: -- bq. maybe fully consistent across the whole table isn't necessary This was just something that has come up in multiple conversations - is full table, consistent snapshots necessary? The default answer seems to be yes, because that is what we get with mysql/postgres/oracle, with the implication that we _have to have it_ for production HBase. However, its inherently a different storage model and it may be that we just need to change the pointy-haired managers' ideas of what's necessary. If instead we can say its is consistent view, per regionserver, at some point from 2:51:00 to 2:51:30 pm on friday, then we can save a lot of pain and can be done with _very little downtime_, to the point where you can mask it in a 'slow request'. If that isn't acceptable, then the only way I see to do it is to drop write availability while the snapshot is taking place (fastest region has to to wait on the slowest region to finish) to get the fully consistent view. No reason with the current design that we couldn't allow both, it would just be another flag to pass in while making the snapshot and really only touches about 5 lines of the guts of the implementation. Currently, we only guarantee row-level consistency. Getting into cross-RS consistency means we bump up against CAP and have to give up even more availability (for a fully consistent view, all RS needs to block writes, which could take as long as the timeout (30sec - 1min in the worst case) or slowest region - whichever is faster, which in many cases is unacceptable). However, if you can take point-in-time within a window as acceptable (sloppy snapshot) - maybe the window is thirty seconds - when each region blocks writes for the time each region takes to make the snapshot (max of maybe a few seconds as no data is being copied, but rather just a few references created and counts incremented) you keep availability high without really sacrificing any of the consistency guarantees we currently make. Clearly, this breaks the multi-put situation where two puts go to different servers but that is potentially acceptable since we don't make guarantees there either, just that on return, all of the puts have completed. Same problem as with doing any of the current backup solutions (replication not included). If you are worried about cross-RS transactions, you have to use something like Omid (or similar) to track transactions, and that system can then also decide when a snapshot is allowable to ensure all parts of the multi-put complete. bq. If we're not reusing much of Chongxin's code, we should put discussion into a new JIRA A lot of the infrastructure we have has changed (eg. zookeeper utilities, locking on the RS), but the new features - reference counting, possibly importing/exporting snapshots, etc - will definitely be reused exactly or only slightly modified. So at 50/50 on what is kept and what is tossed, at least right now. We have also gone through like 3 different stops and starts on this ticket. I worry moving to a new ticket will cause even worse fragmentation, at least until the code being used doesn't even resemble Chongxin's :) Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240157#comment-13240157 ] Jesse Yates commented on HBASE-50: -- We had a meetup within a meetup at the HBase User Group meetup tonight to talk about the difficulties and next steps with snapshotting. The main takeaways were: * exact time is inconsistent across a cluster (even with NTP) as we need millisecond exactness for point-in-time snapshots * two-phase commit where we block writes in the first phase (completes in a set timeout) seems the most reasonable approach for ensuring a fully consistent view * maybe fully consistent across the whole table isn't necessary, maybe per RS consistency within a window is acceptable ** possibly achieved by scheduling a time for a snapshot sometime in the future in zk and letting each RS snapshot makes it 'close enough' * zk triggered snapshots make it even harder to ensure timeout boundaries due to RS no hard guarantees on notifications and even then zk timeouts causing presence issues Even with all of this, I'm planning the first pass to be zk based (until we decide that unavailability suffers too much) and with a simple two-phase, write-locking per involved region approach. We can probably iterate on that to bring down the latency. The main issue here is I don't see a way to ensure that in a snapshot, all RS take a snapshot now but still allow reading/writing on either side (pretty sure this is a CAP limitation). Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240168#comment-13240168 ] Zhihong Yu commented on HBASE-50: - bq. maybe fully consistent across the whole table isn't necessary Can you explain more about the above use case ? This issue has more than 4 years' trace and 6 sub-tasks. If we're not reusing much of Chongxin's code, we should put discussion into a new JIRA. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Affects Versions: 0.96.0 Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Fix For: 0.96.0 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233992#comment-13233992 ] stack commented on HBASE-50: Go for it Jesse. My guess is that a lot has changed since but basic notions should still hold. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234116#comment-13234116 ] Jesse Yates commented on HBASE-50: -- I've been thinking about how to do this for a while (before reading the ticket) and came up with a pretty similar architecture - good to know I wasn't crazy. A bit bummed initially seeing how much Chongxin had gotten done, but going through his stuff it turns out I need to rewrite almost everything (smile) along with a couple tweaks here and there. I'll put up a doc with the architecture diffs (nothing major planned, mostly OO design stuff at the moment) when I get closer to done Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13232908#comment-13232908 ] Jesse Yates commented on HBASE-50: -- If no one is working on this, I'd like to pick up shepherding in Chongxin's original patch (updating to trunk, completion of features, etc) into trunk. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Labels: gsoc Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906486#action_12906486 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-09-06 04:34:53.459404) Review request for hbase. Changes --- Add Mapreduce based export (ExportSnapshot) and import (ImportSnapshot) for snapshot, so that snapshot of an hbase table could be exported and imported to other data centers. Unit test (TestSnapshotExport) has passed. Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - bin/add_snapshot_family.rb PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java bfaa4a1 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java d35a28a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 20860d6 src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/ExportSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/mapreduce/ImportSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 2deea4a src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java 4735304 src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotSentinel.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 9fdd86d src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 8356d64 src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java f1d52b7 src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 107d641 src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 20a535c src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3256ac9 src/main/resources/hbase-default.xml 419bc6d src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java fadee21 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/mapreduce/TestSnapshotExport.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestLogsCleaner.java 8b7f60f src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java c425953 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12901206#action_12901206 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review995 --- bin/add_snapshot_family.rb http://review.cloudera.org/r/467/#comment3204 Please remove this comment. src/main/java/org/apache/hadoop/hbase/HConstants.java http://review.cloudera.org/r/467/#comment3203 Should the archive directory be named .archive ? - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900319#action_12900319 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review957 --- 1. Rename SnapshotTracker to SnapshotSentinel 2. Write a script (add_snapshot_family.rb) to add snapshot family for META and remove method HMaster.addSnapshotFamily. The script is not tested yet (how?) - Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900321#action_12900321 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-08-19 08:35:37.043957) Review request for hbase. Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - bin/add_snapshot_family.rb PRE-CREATION src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotSentinel.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/master/TestLogsCleaner.java 8b7f60f src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION Diff: http://review.cloudera.org/r/467/diff Testing --- Unit tests and integration tests with mini cluster passed. Thanks, Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899195#action_12899195 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review924 --- src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java http://review.cloudera.org/r/467/#comment3015 Should call currentThread().interrupt() src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java http://review.cloudera.org/r/467/#comment3016 Should call currentThread().interrupt() - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899208#action_12899208 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review927 --- src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment3024 Write a script that calls this method. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898127#action_12898127 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-12 10:33:25, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java, line 98 bq. http://review.cloudera.org/r/467/diff/4/?file=6589#file6589line98 bq. bq. Is there more to be done here ? Deleting the region dir? bq. On 2010-08-12 10:33:25, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java, line 94 bq. http://review.cloudera.org/r/467/diff/4/?file=6589#file6589line94 bq. bq. Should return value be checked ? Deleting the snapshot directory at last would delete all snapshot files anyway. Do we still have to check the return value? What if the return value if false, just log it? - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review874 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898203#action_12898203 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review897 --- src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java http://review.cloudera.org/r/467/#comment2925 We should log if we fail to delete. src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java http://review.cloudera.org/r/467/#comment2924 Yes. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897609#action_12897609 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 234 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line234 bq. bq. You might want to check the returns from these methods. Snapshot root dir might already exist, e.g. created in previous start up, then mkdirs would return false. But this is normal. Here are previous comments from Todd: you can just call mkdirs, I think, and it won't fail if it already exists - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review823 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897619#action_12897619 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 166 bq. http://review.cloudera.org/r/467/diff/3/?file=6019#file6019line166 bq. bq. Want to remove this or enable the assertion? One or the other I'd say rather than this. remove it bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java, line 1 bq. http://review.cloudera.org/r/467/diff/3/?file=6021#file6021line1 bq. bq. Its a pity this class is named so. We're about to bring in a new patch that redoes the zk stuff -- breaks it up into pieces each with a singular purpose; e.g. tracking root location or tracking meta region server -- and unfortunately the pattern is to name these purposed classes *Tracker. There'll be a clash of this kinda Tracker and the new zk Trackers. Not important, just saying in case you have another name in mind for this class. I'll think about it. Any suggestion? bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2288 bq. http://review.cloudera.org/r/467/diff/3/?file=6024#file6024line2288 bq. bq. And flushing is disabled at this point too, right? Compactions? (Good). yes, flushing and compaction are disabled when snapshot. bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 944 bq. http://review.cloudera.org/r/467/diff/3/?file=6027#file6027line944 bq. bq. Do we have to do this down at the Store level? Coud we move it up to Region or up to the RegionServer itself? It already has an HTable instance. This method is only used to delete old store files after compaction, is it appropriate to move it to Region? bq. On 2010-08-11 11:32:27, stack wrote: bq. src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java, line 382 bq. http://review.cloudera.org/r/467/diff/3/?file=6037#file6037line382 bq. bq. What about a test of restore from snapshot? Is there one? I dont' see it? It's already in TestAdmin bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/util/FSUtils.java, line 713 bq. http://review.cloudera.org/r/467/diff/3/?file=6032#file6032line713 bq. bq. Does this stuff belong in here in this general utility class? Should it be polluted with References? Should this stuff be over in io package where the Reference is or static methods on Reference? OK, I'll move it to Reference bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 267 bq. http://review.cloudera.org/r/467/diff/3/?file=6028#file6028line267 bq. bq. Why you have to pass the reference? It wasn't needed previously? Previously there is only one type of reference file, i.e. reference after split. But right now there are another type of reference file for snapshot. We need to know the reference type to get the referred to file. This is used for table restored from snapshot. bq. On 2010-08-11 11:32:27, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 2355 bq. http://review.cloudera.org/r/467/diff/3/?file=6024#file6024line2355 bq. bq. If snapshot fails, do we have to do cleanup? HRegions just quit the snapshot mode if fails. The master would be notified with failure and do the clean up work for the whole snapshot. - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review840 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897660#action_12897660 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-08-12 02:43:42.872855) Review request for hbase. Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartiallyOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/Snapshotter.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/master/TestLogsCleaner.java 8b7f60f src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION Diff: http://review.cloudera.org/r/467/diff Testing --- Unit tests and integration tests with mini cluster passed. Thanks, Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897666#action_12897666 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review869 --- src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java http://review.cloudera.org/r/467/#comment2875 How about SnapshotWatcher ? src/main/java/org/apache/hadoop/hbase/regionserver/Store.java http://review.cloudera.org/r/467/#comment2874 I think putting this in Region is good. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java http://review.cloudera.org/r/467/#comment2876 Can we get to hbase root directly ? - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897732#action_12897732 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-12 02:53:06, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java, line 1 bq. http://review.cloudera.org/r/467/diff/3/?file=6021#file6021line1 bq. bq. How about SnapshotWatcher ? Will it sound like this class implement the Watcher interface of ZK? bq. On 2010-08-12 02:53:06, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 283 bq. http://review.cloudera.org/r/467/diff/3/?file=6028#file6028line283 bq. bq. Can we get to hbase root directly ? Since this method is static, we probably need another parameter for root directory? - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review869 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897735#action_12897735 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673 bq. http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673 bq. bq. This is fine for an hbase that is a fresh install but what about case where the data has been migrated from an older hbase version; it won't have this column family in .META. We should make a little migration script that adds it or on start of new version, check for it and if not present, create it. bq. bq. Chongxin Li wrote: bq. That's right. But AddColumn operation requires the table disabled to proceed, ROOT table can not be disabled once the system is started. Then how could we execute the migration script or check and create it on start of new version? This can be done with a script when HBase is shutdown. The script scans the root region with MetaUtils and add the column family SNAPSHOT to .META. table? - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review823 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897854#action_12897854 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review874 --- src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java http://review.cloudera.org/r/467/#comment2888 Check return value. src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java http://review.cloudera.org/r/467/#comment2887 Should return value be checked ? src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java http://review.cloudera.org/r/467/#comment2886 Is there more to be done here ? - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897246#action_12897246 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 22:40:31, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 962 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line962 bq. bq. Moving crashed snapshots has two benefits: bq. 1. future call to listSnapshots() wouldn't encounter IOException. bq. 2. it's easy for user to get statistics on failed snapshots and analyze them bq. bq. Or, if you log enough information when cleaning up the failed snapshot. bq. What about snapshot fails when it is being created? Currently it is cleaned up if exception occurs in HMaster.snapshot. Should we also move it to this directory? Then for reference information sync, should we also take the reference files of these failed snapshots into account? - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review830 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897250#action_12897250 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673 bq. http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673 bq. bq. This is fine for an hbase that is a fresh install but what about case where the data has been migrated from an older hbase version; it won't have this column family in .META. We should make a little migration script that adds it or on start of new version, check for it and if not present, create it. That's right. But AddColumn operation requires the table disabled to proceed, ROOT table can not be disabled once the system is started. Then how could we execute the migration script or check and create it on start of new version? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899 bq. http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line899 bq. bq. Can the snapshot name be empty and then we'll make one up? a default snapshot name? or a auto-generated snapshot name, such as creation time? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951 bq. http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line951 bq. bq. For restore of the snapshot, do you use loadtable.rb or Todd's new bulkloading scripts? Currently, NO... Snapshot is composed of a list of log files and a bunch of reference files for HFiles of the table. These reference files have the same hierarchy as the original table and the name is in the format of 1239384747630.tablename, where the front is the file name of the referred HFile and the latter is table name for snapshot. Thus to restore a snapshot, just copy reference files (which are just a few bytes) to the table dir, update the META and split the logs. When this table is enabled, the system know how to replay the commit edits and read such a reference file. Methods getReferredToFile, open in StoreFile are updated to deal with this kind of reference files for snapshots. At present, snapshot can only be restored to the table whose name is the same as the one for which the snapshot is created. That the old table with the same name must be deleted before restore a snapshot. That's what I do in unit test TestAdmin. Restoring snapshot to a different table name has a low priority. It has not been implemented yet. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50 bq. http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line50 bq. bq. Whats this? A different kind of reference? Yes.. This is the reference file in snapshot. It references an HFile of the original table. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 115 bq. http://review.cloudera.org/r/467/diff/3/?file=6018#file6018line115 bq. bq. This looks like a class that you could write a unit test for? Sure, I'll add another case in TestLogsCleaner. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line 130 bq. http://review.cloudera.org/r/467/diff/3/?file=6017#file6017line130 bq. bq. If table were big, this could be prohibitively expensive? A single-threaded copy of all of a tables data? We could compliment this with MR-base restore, something that did the copy using MR? This method is only used in RestoreSnapshot, where reference files of snapshot are copied to the table dir. These reference files just contains a few bytes instead of the table's data. Snapshots share the table data with the original table and other snapshots. Do we still need a MR job? bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212 bq. http://review.cloudera.org/r/467/diff/3/?file=6013#file6013line212 bq. bq. Why Random negative number? Why not just leave it blank? If a blank value is used as the key, there would be only one item at last if it is the first few times to scan the regions. Using random negative number indicates all these regions have not been scanned before. If it has been scanned, there would be a last checking time for it instead. bq. On 2010-08-10 21:34:40, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line 251 bq. http://review.cloudera.org/r/467/diff/3/?file=6012#file6012line251 bq. bq. Is this comment right? I just renamed the Ranges to caps, comment was not
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897257#action_12897257 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 22:20:23, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156 bq. http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line156 bq. bq. I think the current code is backward compatible. Boolean value of true is interpreted as TOP, value of false is BOTTOM. bq. Since ENTIRE is introduced, this code is not backward compatible. bq. bq. See: bq. http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#writeBoolean%28boolean%29 Why it is not backward compatible when ENTIRE is introduces? The value for ENTIRE is 2, different from the old written value of boolean. - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review829 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897474#action_12897474 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review846 --- src/main/java/org/apache/hadoop/hbase/io/Reference.java http://review.cloudera.org/r/467/#comment2846 I meant value of 2 cannot be correctly interpreted as boolean. src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2847 I think we need to limit the space consumed by failed snapshots. This issue can be addressed by a future JIRA. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896963#action_12896963 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review799 --- src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2704 Do we need to abort TableSnapshot processing in case of exception ? src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2707 If you create directory for failed snapshots, you can also add listFailedSnapshots() method. src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2705 It would be better to move crashed snapshots into a separate directory under snapshot rootDir. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897042#action_12897042 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review803 --- src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java http://review.cloudera.org/r/467/#comment2713 IOException should be handled so that synchronization of reference counts isn't interrupted. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897143#action_12897143 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 10:49:06, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java, line 36 bq. http://review.cloudera.org/r/467/diff/3/?file=6001#file6001line36 bq. bq. Drop the H. Call it SnapshotDescriptor Alright bq. On 2010-08-10 10:49:06, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java, line 41 bq. http://review.cloudera.org/r/467/diff/3/?file=6001#file6001line41 bq. bq. If it is in under the snapshot directory maybe just call this file snapshotinfo? Drop the '.' prefix. The '.' prefix is usually to demark 'special' files we don't want to consider as part of normal operation. In this case, we are under a snapshot directory, already outside of 'normal' operation. This is named following .regioninfo bq. On 2010-08-10 10:49:06, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 373 bq. http://review.cloudera.org/r/467/diff/3/?file=6000#file6000line373 bq. bq. How often is this called? If it happens alot, it could add up -- be expensive. Not too much actually. This method is only called in BaseScanner when reference rows in META are checked and synchronized with the reference files. And right now there would be at most five rows to be checked in one scan of META. There is no region info saved in each reference row. Thus reference row which is a combination of SNAPSHOT_PREFIX and region name is parsed to obtain the region name. That's why we need this method. - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review800 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897144#action_12897144 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-10 10:04:44, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 962 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line962 bq. bq. It would be better to move crashed snapshots into a separate directory under snapshot rootDir. If so, probably we need the above method. But why move crashed snapshots into a separate directory? It would be pretty hard to recover a crashed snapshot. bq. On 2010-08-10 10:04:44, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 945 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line945 bq. bq. If you create directory for failed snapshots, you can also add listFailedSnapshots() method. Currently there is no directory for failed snapshots. If snapshot fails, it is cleaned up and exception is thrown to notify the user. bq. On 2010-08-10 10:04:44, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 930 bq. http://review.cloudera.org/r/467/diff/3/?file=6015#file6015line930 bq. bq. Do we need to abort TableSnapshot processing in case of exception ? For snapshot which is created by TableSnapshot, the table must be offline and snapshot is totally driven by the master. Region servers have no awareness of such a snapshot. So in case of exception, just clean up the failed snapshot. There is no need to abort the snapshot across the cluster. Regarding SnapshotMonitor, it only monitors the snapshots which are created across the region servers. - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review799 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897146#action_12897146 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review829 --- src/main/java/org/apache/hadoop/hbase/io/Reference.java http://review.cloudera.org/r/467/#comment2793 I think the current code is backward compatible. Boolean value of true is interpreted as TOP, value of false is BOTTOM. Since ENTIRE is introduced, this code is not backward compatible. See: http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#writeBoolean%28boolean%29 - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897153#action_12897153 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review830 --- src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2794 Moving crashed snapshots has two benefits: 1. future call to listSnapshots() wouldn't encounter IOException. 2. it's easy for user to get statistics on failed snapshots and analyze them Or, if you log enough information when cleaning up the failed snapshot. - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896498#action_12896498 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- (Updated 2010-08-09 03:52:11.875655) Review request for hbase. Changes --- Quite a lot of changes have been made according Todd's review, here are some major ones: 1. Refactor SnapshotMonitor into one part that is master-global and another part that is created once per-snapshot (SnapshotTracker). 2. Catch exceptions in HMaster.snapshot and clean up the snapshot if exceptions occur. 3. Always quit snapshot mode for regions no matter whether the snapshot is created successfully on RS. 4. Add a mechanism to check and synchronize the reference count in META with the number of reference files in BaseScanner. 5. Add snapshot operations: DeleteSnapshot, RestoreSnapshot and corresponding tests (in TestAdmin). Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584 src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896515#action_12896515 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 22 bq. http://review.cloudera.org/r/467/diff/2/?file=4140#file4140line22 bq. bq. worth noting that this class is not thread-safe? I don't know if these classes need to be thread safe, but you're using an unsynchronized hashset. Also, since refreshHLogsAndSearch clears hlogs before re-adding stuff, it needs to be synchronized more than just using a synchronized collection. This class is only instantiated once by LogsCleaner so it can be seen as a singleton per master. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 116 bq. http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line116 bq. bq. does ZKW automatically re-watch the nodes for you, here? bq. bq. Also, how does this interact with region server failure? We just assume that the snapshot will timeout and abort, instead of proactively detecting? Yes, the ZKW automatically re-watch the nodes. For snapshot abort, if any region server fails to perform the snapshot, it will remove corresponding ready and finished nodes under snapshot directory. This would notify the master snapshot failure and further abort snapshot on all region servers via ZK For snapshot timeout, it is not detected here. In method waitToFinish, the snapshot status is checked at a regular time (3 seconds here). If this method timeout, exception would be thrown and master will abort the snapshot over the cluster. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java, line 132 bq. http://review.cloudera.org/r/467/diff/2/?file=4143#file4143line132 bq. bq. is there a process that scans for cases where the reference count has gotten out of sync? bq. I'm worried about a case where a snapshot is half-done, and then it fails, so the snapshot is considered aborted, but we never clean up the references because META has been incremented. This is added in META scanner. Since scanning reference files is expensive, only a few regions are checked and synchronized in one scan. A last checking time is added so that all reference regions are guaranteed to be checked eventually bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java, line 1403 bq. http://review.cloudera.org/r/467/diff/2/?file=4153#file4153line1403 bq. bq. these checks are inherently racy Then remove it? bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 585 bq. http://review.cloudera.org/r/467/diff/2/?file=4148#file4148line585 bq. bq. this seems prone to collision if it's multithreaded, since the exists check and the use of the filename aren't atomic Then how to guarantee atomicity? This unique file name should be unique respecting existing files and files which are already compacted and deleted. Otherwise there might be a name collision in archive directory for HFiles bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java, line 132 bq. http://review.cloudera.org/r/467/diff/2/?file=4130#file4130line132 bq. bq. since we're using the snapshot name as a directory name in HDFS, it has to be a UTF8 string, so why not just keep it as a String above too? I implemented this class following HTableDescriptor. And even for table name, it is usually used as a byte array instead of String bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 61 bq. http://review.cloudera.org/r/467/diff/2/?file=4134#file4134line61 bq. bq. to keep compatibility with current storefiles, entire should be value 2, and bottom should be 0 bq. bq. while we're at it, maybe rename these to be all caps - Range.TOP, Range.BOTTOM, etc Have been renamed in the latest revision bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 918-919 bq. http://review.cloudera.org/r/467/diff/2/?file=4138#file4138line918 bq. bq. should this be an exception, rather than a result code? ie is it normal to fail? Currently all results except ALl_FINISH would throw an exception. bq. On 2010-08-02 13:41:35, Todd Lipcon wrote: bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 925 bq.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12896516#action_12896516 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn bq. On 2010-08-03 09:58:06, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 246 bq. http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line246 bq. bq. I think this should be (retries == 4) for 3 retries this is actually not 'retry' for snapshot, but check whether the snapshot is finished for three times (retries = 0, 1, 2). - Chongxin --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review631 --- Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894937#action_12894937 ] HBase Review Board commented on HBASE-50: - Message from: Ted Yu ted...@yahoo.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review631 --- src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java http://review.cloudera.org/r/467/#comment2373 I think this should be (retries == 4) for 3 retries - Ted Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894706#action_12894706 ] HBase Review Board commented on HBASE-50: - Message from: Todd Lipcon t...@cloudera.com --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review618 --- looks pretty good! I didn't et a chance to look through the test cases in detail, I'll try to look them over some more later this week. src/main/java/org/apache/hadoop/hbase/HConstants.java http://review.cloudera.org/r/467/#comment2293 since we also have a log archive dir somewhere, should specify this a little more - this is archived HFiles that are still referenced by snapshots? src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java http://review.cloudera.org/r/467/#comment2294 license src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java http://review.cloudera.org/r/467/#comment2295 no need for @param javadoc if there is no actual description attached. same thing below in a few places src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java http://review.cloudera.org/r/467/#comment2296 why not System.currentTimeMillis? src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java http://review.cloudera.org/r/467/#comment2297 empty @return src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java http://review.cloudera.org/r/467/#comment2298 since we're using the snapshot name as a directory name in HDFS, it has to be a UTF8 string, so why not just keep it as a String above too? src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java http://review.cloudera.org/r/467/#comment2299 no need for this javadoc (it's obvious) src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java http://review.cloudera.org/r/467/#comment2300 same with this one src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java http://review.cloudera.org/r/467/#comment2301 add TODO to this comment src/main/java/org/apache/hadoop/hbase/io/Reference.java http://review.cloudera.org/r/467/#comment2302 to keep compatibility with current storefiles, entire should be value 2, and bottom should be 0 while we're at it, maybe rename these to be all caps - Range.TOP, Range.BOTTOM, etc src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java http://review.cloudera.org/r/467/#comment2303 no need to check size() - iterating the empty array should be fine src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java http://review.cloudera.org/r/467/#comment2304 if we crash between step 1 and 2, we orphan the archived file. Instead, we can do the delete first (ignoring failure if it doesn't exist) and then update META. src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2305 you can just call mkdirs, I think, and it won't fail if it already exists src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2306 should this be an exception, rather than a result code? ie is it normal to fail? src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2309 do we have a race here? what if the table gets enabled while the snapshot is being processed? it seems we need some locking here around table status and snapshot modification src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2311 shouldn't we rethrow in this error case? and in the above error case? ie these should be clauses like: boolean success=false; try { ... make snapshot ... success = true; } finally { if (!success) { deleteSnapshot(); } } src/main/java/org/apache/hadoop/hbase/master/HMaster.java http://review.cloudera.org/r/467/#comment2313 would it be problematic to create a partially written snapshotinfo file? or would it get cleaned up at a higher layer? (perhaps worth creating snapshotinfo.tmp, then atomically rename it to snapshotinfo if it writes correctly) src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java http://review.cloudera.org/r/467/#comment2314 license src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java http://review.cloudera.org/r/467/#comment2315 worth noting that this class is not thread-safe? I don't know if these classes need to be thread safe, but you're using an unsynchronized hashset. Also, since refreshHLogsAndSearch clears hlogs before re-adding stuff, it needs to be synchronized more than just using a synchronized collection.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894369#action_12894369 ] HBase Review Board commented on HBASE-50: - Message from: Chongxin Li lichong...@zju.edu.cn --- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/ --- Review request for hbase. Summary --- This patch includes the first three sub-tasks of HBASE-50: 1. Start and monitor the creation of snapshot via ZooKeeper 2. Create snapshot of an HBase table 3. Some existing functions of HBase are modified to support snapshot Currently snapshots can be created as expected, but can not be restored or deleted yet This addresses bug HBASE-50. http://issues.apache.org/jira/browse/HBASE-50 Diffs - src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java 4d4b00a src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 src/main/resources/hbase-default.xml b73f0ff src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION Diff: http://review.cloudera.org/r/467/diff Testing --- Unit tests and integration tests with mini cluster passed. Thanks, Chongxin Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887983#action_12887983 ] stack commented on HBASE-50: I took a quick look Li. What is the SNAPSHOTINFO_FILE? Is it name of file that we write snapshot data to? Should it be named for the snapshot name? Looks like you name the dir that holds this file for the snapshot name? Do we need a directory? Can we get away with just files that are named fo the snapshot name and that hold the snapshot data? You should add javadoc comments to your classes; say what the class is for (hmm... seems like you usually doo... just the first few in this commit seem to be missing them... they are there for the others). So far, it looks great. Keep up the good work. Tests next I'd say. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883779#action_12883779 ] Li Chongxin commented on HBASE-50: -- bq. isSnapshot in HRI? bq. Will keeping snapshot data in .META. work? .META. is by region but regions are deleted after a split but you want your snapshot to live beyond this? Snapshot data, actually the reference count of hfiles, will be kept in .META. table, but in a different row than the original region row. So these reference count information will not be deleted after a split. Reference count information is saved here because it is also in a region centric view. Reference count information of a region's hfiles are kept together in a row in .META. no matter this hfile is still in use or has been archived. I described this in the Appendix A. of the document. bq. In zk, writeZnode and readZnode ain't the best names for methods... what kinda znodes are these? (Jon says these already exist, that they are not your fault) Actually the method names for snapshot are startSnapshotOnZK, abortSnapshotOnZK, registerRSForSnapshot in ZooKeeperWrapper. I put writeZnode and readZnode in the diagram because I think I can use them inside the above methods. Do you think we should make writeZnode and readZnode private and just use them inside ZooKeeperWrapper? bq. Can you make a SnapShot class into which encapsulate all related to snapshotting rather than adding new data members to HMaster? Maybe you do encapsulate it all into snapshotmonitor? I haven't figured out all the data members in the design. I will create a Snapsnot class to encapsulate the related fields if necessary during implementation. bq. Can you call RSSnapshotHandler just SnapshotHandler? sure bq. You probably don't need to support String overloads. You mean methods in HBaseAdmin? A repository has been created in github with the initial content of hbase/trunk http://github.com/lichongxin/hbase-snapshot Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883300#action_12883300 ] Jonathan Gray commented on HBASE-50: Agree, github probably much better. Would still be advisable to put patches onto reviewboard but I like idea of taking committers out of critical path. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882867#action_12882867 ] stack commented on HBASE-50: Thinking on it, Li, maybe its best if you work up in github and just log here when you do big pushes to your github repro? That way you are in charge of it and not dependent on laggard hbase committers getting your work into the branch? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882779#action_12882779 ] stack commented on HBASE-50: I'll make a branch to host Li's work going forward. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880454#action_12880454 ] Li Chongxin commented on HBASE-50: -- Sure. We do need a branch for snapshot. Currently I'm working on TRUNK. Once the stuff is ready, I think we can create a new feature branch for commit. What do you think? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880093#action_12880093 ] Li Chongxin commented on HBASE-50: -- bq. Fail with a warning. A nice-to-have would be your suggestion of restoring snapshot into a table named something other than the original table's name (Fixing this issue is low-priority IMO). bq. .. it's a good idea to allow snapshot restore to a new table name while the original table is still online. And the restored snapshot should be able to share HFiles with the original table I will make this issue a low-priority sub-task. One more question, besides metadata and log file, what else data should take care to rename the snapshot to a new table name? Are there any other files (e.g. HFiles) containing table name? bq. ... didn't we discuss that .META. might not be the place to keep snapshot data since regions are deleted when the system is done w/ them (but a snapshot may outlive a particular region). I misunderstood... I thought you were talking about create a new catalog table 'snapshot' to keep the metadata of snapshots, such as creation time. In current design, a region will not be delete if it is still used by a snapshot, even if the system has done with it. This region would be probably marked as 'deleted' in .META. This is discussed in section 6.2, 6.3 and no new catalog table is added. Do you think it is appropriate to keep metadata in .META. for a deleted region? Do we still need a new catalog table? bq. rather than causing all of the RS to roll the logs, they could simply record the log sequence number of the snapshot, right? This will be a bit faster to do and causes even less of a hiccup in concurrent operations (and I don't think it's any more complicated to implement, is it?) Yes, sounds good. The log sequence number should also be included when the logs are split because log files would contain the data both before and after the snapshot, right? bq. Making the client orchestrate the snapshot process seems a little strange - could the client simply initiate it and put the actual snapshot code in the master? I think we should keep the client as thin as we can Ok, This will change the design a little. bq. I'd be interested in a section about failure analysis - what happens when the snapshot coordinator fails in the middle? .. That will be great! Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880276#action_12880276 ] Jonathan Gray commented on HBASE-50: +1 on feature branch once stuff is ready for commit Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878411#action_12878411 ] stack commented on HBASE-50: .bq What if the table with the same name is still online when we want to restore a snapshot Fail with a warning. A nice-to-have would be your suggestion of restoring snapshot into a table named something other than the original table's name (Fixing this issue is low-priority IMO). There a few things in the above that make me want to go over the design again. I'll report back after I've done that. Specifically: .bq Rename 'reference' family in .META. to 'snapshot' ... didn't we discuss that .META. might not be the place to keep snapshot data since regions are deleted when the system is done w/ them (but a snapshot may outlive a particular region). Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877121#action_12877121 ] Jean-Daniel Cryans commented on HBASE-50: - bq. Yes..That sounds good. I will implement another LogCleanerDelegate, say ReferenceLogCleaner or SnapshotLogCleaner. Latter. Some refactoring could be done on how to chain multiple delegates without doing a bunch of ifs in the code. Could be in the scope of another jira. bq. Do you archive any other files besides log files, say HFiles? AFAIK, no. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876699#action_12876699 ] Li Chongxin commented on HBASE-50: -- bq. ... but also after snapshot is done your design should include description of how files are archived, rather than deleted... Are you talking about files that are no longer used by hbase table but are referenced by snapshot? I think this has been described in chapter 6 'Snapshot Maintenance'. For example, hfiles are archived in delete directory. And section 6.4 describes how these files will be cleaned up. bq. ..In fact you'll probably be doing a snapshot of at least a subset of .META. on every table snapshot I'd imagine - at least the entries for the relevant table. .META. entries for the snapshot table have been dumped, haven't they? Why we still need a snapshot of a subset of .META.? bq. So, do you foresee your restore-from-snapshot running split over the logs as part of the restore? That makes sense to me. Yes, restore-from-snapshot has to run split over the WAL logs. It will take some time. So restore-from-snapshot will not be very fast. bq. Why you think we need a Reference to the hfile? Why not just a file that lists the names of all the hfiles? We don't need to execute the snapshot, do we? Restoring from a snapshot would be a bunch of file renames and wal splitting? At first I thought snapshot probably should keep the table directory structure for the later use. For example, a reader like HalfStoreFileReader could be provided so that we could read from the snapshot directly. But yes, we actually don't execute the snapshot. So keeping a list of all the hfiles (actually one list per RS, right?) should be enough. And also restroing from snapshot is not just file renames. Since a hfile might be referenced by several snapshot, we should probably do real copy when restroing, right? bq. Shall we name the new .META. column family snapshot rather than reference? sure bq. On the filename '.deleted', I think it a mistake to give it a '.' prefix especially given its in the snapshot dir... Ok, I will rename the snapshot dir as '.snapshot'. For dir '.deleted', what name do you think we should use? Because there might be several snapshots under the dir '.snapshot', each has a snapshot name, I name this dir as '.deleted' to discriminate it from a snapshot name. bq. Do you need a new catalog table called snapshots to keep list of snapshots, of what a snapshot comprises and some other metadata such as when it was made, whether it succeeded, who did it and why? It'll be much more convenient if a catalog table 'snapshot' can be created. Will this impact normal operation of hbase? bq. Section 7.4 is missing split of WAL files. Perhaps this can be done in a MR job? I'll add the split of WAL logs. Yes, a MR job can be used. Which method do you think is better? Read from the imported file and inserted into the table by hbase api. Or just copy the hfile into place and update the .META.? bq. Lets not have the master run the snapshot... let the client run it? bq. Snapshot will be doing same thing whether table is partially online or not.. I put these two issues together because I think they are correlative. In current design, if a table is opened, snapshot will be performed by each RS which serves tha table regions. Otherwise, if a table is closed, snapshot will be performed by the master because the table is not served by any RS. For the first comment, it is talking about closed table. So master will perform the snapshot because client does not have access to underlying dfs. For the second one, I was thinking if a table is partially online, table regions might be partially served by RS and partially offline, right? Then who will perform the snapshot? If RS, the regions that are offline will be missed. If the master, regions that are online might lose data in memstore. I'm confused.. bq. It's a synchronous way. Do you think this is appropriate? Yes. I'm w/ JG on this. This is another problem confusing me..In current design (which is a synchronous way), a snapshot is started when all the RS are ready for snapshot. Then all RS perform snapshot concurrently. This guarantees snapshot is not started if one RS fails. If we switch to an asynchronous approach. Should the RS start snapshot immediately when it is ready? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876259#action_12876259 ] Li Chongxin commented on HBASE-50: -- @Stack, Thanks for the comments. Here are some replies and questions for the comments. .bq + I don't think you should take on requirement 1), only the hbase admin can create a snapshot. There is no authentication/access control in hbase currently - its coming but not here yet - and without it, this would be hard for you to enforce. I think I didn't state it properly. I know access control is not included in hbase currently. What I mean here is, snapshot should be put in class HBaseAdmin instead of HTable. Client side operations being divided into these two classes is also for the consideration of access control which is provided in the future, isn't it? .bq + Regards requirement 2., I'd suggest that how the snapshot gets copied out from under hbase should also be outside the scope of your work. I'd say your work is making a viable snapshot that can be copied with perhaps some tests to prove it works - that might copy off data - but in general, i'd say how actual copying is done is outside of the scope of this issue. Strictly, requirement 2 is not about how snapshot is copied out from under hbase. Actually, table data is not really copied when snapshot in current design. To make it fast, snapshot just captures the state of the table especially all the table files. So for requirement 2, just make sure the table data (hfiles indeed) are not mutated when snapshot. bq. + How you going to ensure tabke is in 'good status'. Can you not snapshot it whatever its state? All regions being on line is a requirement? Regarding tables that are disabled, all regions being on line should not be a requirement. As for 'good status', what I'm thinking is a table region could be in PENDING_OPEN or PENDING_CLOSE state, in which it might be half opened. I'm not sure wether RS or the master should take on the responsibility to perform the snapshot at this time. On the other side, if the table is completely opened or closed, snapshot can be taken by RS or the master. bq. + FYI, wal logs are now archived, not deleted. Replication needs them. Replication might also be managing clean up of the archives (j-d, whats the story here?) If an outstanding snapshot, one that has not been deleted, then none of its wals should be removed. Great. In current design, WAL log files are the only data files that are really copied. If they are now archived instead of deleted, we can create log files reference just as hfiles instead of copying the actual data. This will further shorten the snapshot time. Another LogCleanerDelegate, say ReferencedLogCleaner, could be created to check whether the log file should be deleted for the consideration of snapshot. What do you think? bq. + I can say 'snapshot' all tables? Can I say 'snapshot catalog tables - meta and root tables?' I think snapshot for meta works fine but snapshot for root table is a little tricky. When the snapshot is performed for a user table, .META. is updated to keep track of the file references. If a .META. table is snapshot, -ROOT- can be update to keep track of the file references. But where to keep the file references for -ROOT- table(region) if it is snapshot, still in -ROOT-? Should these newly updated file references information also be included in the snapshot? bq. + If a RS fails between 'ready' and 'finish', does this mean we abandon the snapshot? Yes. If a RS fails between 'ready' and 'finish', it should notify the client or master, whichever orchestrates, then the client or the master will send a signal to stop the snapshot on all RS via ZK. Something like this. bq. + I'd say if RS is not ready for snapshot, just fail it. Something is badly wrong is a RS can't snapshot. Currently, there is a timeout for snapshot ready. If a RS is ready, it'll wait for all the RS to be ready. Then the snapshot starts on all RS. Otherwise, the ready RS timeout and snapshot does not start on any RS. It's a synchronous way. Do you think this is appropriate? Will it create too much load to perform snapshot concurrently on the RS? (Jonathan perfer an asynchronous method) bq. + Would it make sense for there to be a state between ready and finish and the data in this intermediate state would be the RS's progress? Do you mean a znode is create for each RS to keep the progress? Then how do you define the RS's progress? What data will be kept in this znode? Thanks again for the comments. I will update the design document based on them. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876352#action_12876352 ] stack commented on HBASE-50: On, 5 Snapshot Creation .bq Because this table region must be online, dumping the HRegionInfo of the region to a file .regioninfo under the snapshot directory of this region will obtain the metadata. ...The above is wrong, right? We can snapshot online tables? +1 on reading .META. data, flushing it to .regioninfo to be sure you have latest, and then copying that (Or, instead, you could ensure that on any transistion, the .regioninfo is updated. If this is happening, no need to do extra flush of .META. at snapshot time. This latter would be better IMO). So, do you foresee your restore-from-snapshot running split over the logs as part of the restore? That makes sense to me. Why you think we need a Reference to the hfile? Why not just a file that lists the names of all the hfiles? We don't need to execute the snapshot, do we? Restoring from a snapshot would be a bunch of file renames and wal splitting? Or what are you thinking? (Oh, maybe I'll find out when I read chapter 6). .bq can be created just by the master. Lets not have the master run the snapshot... let the client run it? Shall we name the new .META. column family snapshot rather than reference? I like this idea of keeping region snapshot and reference counting beside the region up in .META. On the filename '.deleted', I think it a mistake to give it a '.' prefix especially given its in the snapshot dir (the snapshot dir probably needs to be prefixed with a character illegal in tablenames such as a '.' so its not taken for a table directory). Regards 'Not sure whether there will be a name collision under this .deleted directory', j-d has done work to ensure WALs are uniquely named. Storefiles are given a random-id. We should probably do the extra work to ensure they are for sure unique... give them a UUID or something to we don't ever clash. After reading chapter 6, I fail to see why we should keep References to files. Maybe I'm missing something. .bq Not decides where to keep all the snapshots information, in a meta file under snapshot directory Do you need a new catalog table called snapshots to keep list of snapshots, of what a snapshot comprises and some other metadata such as when it was made, whether it succeeded, who did it and why? On the other hand, a directory in hdfs of files per snapshot will be more robust. Section 7.4 is missing split of WAL files. Perhaps this can be done in a MR job? Design looks excellent Li. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875941#action_12875941 ] stack commented on HBASE-50: Here are some comments on the design requirments Li: + Add a date, add a link to this issue to give your design context. + FYI, there has been talk of adding snapshots to hdfs. Its mentioned here: http://hadoop.apache.org/common/docs/current/hdfs_design.html#Snapshots. The issue is stalled at the moment: HDFS-233. + I don't think you should take on requirement 1), only the hbase admin can create a snapshot. There is no authentication/access control in hbase currently -- its coming but not here yet -- and without it, this would be hard for you to enforce. + Regards requirement 2., I'd suggest that how the snapshot gets copied out from under hbase should also be outside the scope of your work. I'd say your work is making a viable snapshot that can be copied with perhaps some tests to prove it works -- that might copy off data -- but in general, i'd say how actual copying is done is outside of the scope of this issue. + Requirement 6., resuming from a snapshot, yes, this is in scope (how the stuff is copied into place I'd argue is not. Of course, if you have the time to work on copy out and copy in functionality, great, but I'd peg this lower priority). + Otherwise, the requirements are great. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.