[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: Snapshot Class Diagram.png Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, Snapshot Class Diagram.png, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: HBase Snapshot Implementation Plan.pdf HBase Snapshot Implementation Plan describes the classes and methods that are going to be created and modified to support snapshot. Go over the document with the class diagram. Any comments are welcome! Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: (was: snapshot-src.zip) Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2789) Propagate HBase config from Master to region servers
[ https://issues.apache.org/jira/browse/HBASE-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882551#action_12882551 ] Ted Yu commented on HBASE-2789: --- Yes. Propagate HBase config from Master to region servers Key: HBASE-2789 URL: https://issues.apache.org/jira/browse/HBASE-2789 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.20.3 Reporter: Ted Yu If HBase config is modified when HBase cluster is running, the changes wouldn't propagate to region servers after restarting cluster. This is different from hadoop behavior where changes get automatically copied to data nodes. This feature is desirable when enabling JMX, e.g. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting
[ https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882638#action_12882638 ] Jean-Daniel Cryans commented on HBASE-2707: --- So actually the code of process is looks like: {code} LOG.info(Log split complete, meta reassignment and scanning:); if (this.isRootServer) { LOG.info(ProcessServerShutdown reassigning ROOT region); master.getRegionManager().reassignRootRegion(); isRootServer = false; // prevent double reassignment... heh. } for (MetaRegion metaRegion : metaRegions) { LOG.info(ProcessServerShutdown setting to unassigned: + metaRegion.toString()); master.getRegionManager().setUnassigned(metaRegion.getRegionInfo(), true); } // one the meta regions are online, forget about them. Since there are explicit // checks below to make sure meta/root are online, this is likely to occur. metaRegions.clear(); if (!rootAvailable()) { // Return true so that worker does not put this request back on the // toDoQueue. // rootAvailable() has already put it on the delayedToDoQueue return true; } if (!rootRescanned) { // Scan the ROOT region Boolean result = new ScanRootRegion( new MetaRegion(master.getRegionManager().getRootRegionLocation(), HRegionInfo.ROOT_REGIONINFO), this.master).doWithRetries(); if (result == null) { // Master is closing - give up return true; } if (LOG.isDebugEnabled()) { LOG.debug(Process server shutdown scanning root region on + master.getRegionManager().getRootRegionLocation().getBindAddress() + finished + Thread.currentThread().getName()); } rootRescanned = true; } {code} So if the RS had -ROOT-, it will be reassigned right away and then the method returns if !rootAvailable. Later when we come back and root was assigned, process server shutdown will finish its job. This is how the code you pasted succeeds. Can't recover from a dead ROOT server if any exceptions happens during log splitting Key: HBASE-2707 URL: https://issues.apache.org/jira/browse/HBASE-2707 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: stack Priority: Blocker Fix For: 0.21.0 Attachments: HBASE-2707.patch There's an almost easy way to get stuck after a RS holding ROOT dies, usually from a GC-like event. It happens frequently to my TestReplication in HBASE-2223. Some logs: {code} 2010-06-10 11:35:52,090 INFO [master] wal.HLog(1175): Spliting is done. Removing old log dir hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 2010-06-10 11:35:52,095 WARN [master] master.RegionServerOperationQueue(183): Failed processing: ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed todo queue java.io.IOException: Cannot delete: hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298) at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456) Caused by: java.io.IOException: java.io.IOException: /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty 2010-06-10 11:35:52,097 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,098 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,523 INFO [main.serverMonitor] master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average load 14.0[10.10.1.63,55846,1276194933831] 2010-06-10 11:35:54,099 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:55,101 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items {code} The last lines are my own debug. Since we don't process the delayed todo if ROOT isn't online, we'll never reassign the regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting
[ https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882649#action_12882649 ] stack commented on HBASE-2707: -- So its broken then? We assign -ROOT- but don't recover its edits? Can't recover from a dead ROOT server if any exceptions happens during log splitting Key: HBASE-2707 URL: https://issues.apache.org/jira/browse/HBASE-2707 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: stack Priority: Blocker Fix For: 0.21.0 Attachments: HBASE-2707.patch There's an almost easy way to get stuck after a RS holding ROOT dies, usually from a GC-like event. It happens frequently to my TestReplication in HBASE-2223. Some logs: {code} 2010-06-10 11:35:52,090 INFO [master] wal.HLog(1175): Spliting is done. Removing old log dir hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 2010-06-10 11:35:52,095 WARN [master] master.RegionServerOperationQueue(183): Failed processing: ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed todo queue java.io.IOException: Cannot delete: hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298) at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456) Caused by: java.io.IOException: java.io.IOException: /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty 2010-06-10 11:35:52,097 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,098 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,523 INFO [main.serverMonitor] master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average load 14.0[10.10.1.63,55846,1276194933831] 2010-06-10 11:35:54,099 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:55,101 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items {code} The last lines are my own debug. Since we don't process the delayed todo if ROOT isn't online, we'll never reassign the regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting
[ https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882653#action_12882653 ] stack commented on HBASE-2707: -- Hmm... chatted with J-D and he points out that the above runs AFTER logs are split so I had it incorrect. Above should be good. Can't recover from a dead ROOT server if any exceptions happens during log splitting Key: HBASE-2707 URL: https://issues.apache.org/jira/browse/HBASE-2707 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: stack Priority: Blocker Fix For: 0.21.0 Attachments: HBASE-2707.patch There's an almost easy way to get stuck after a RS holding ROOT dies, usually from a GC-like event. It happens frequently to my TestReplication in HBASE-2223. Some logs: {code} 2010-06-10 11:35:52,090 INFO [master] wal.HLog(1175): Spliting is done. Removing old log dir hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 2010-06-10 11:35:52,095 WARN [master] master.RegionServerOperationQueue(183): Failed processing: ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed todo queue java.io.IOException: Cannot delete: hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298) at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456) Caused by: java.io.IOException: java.io.IOException: /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty 2010-06-10 11:35:52,097 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,098 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,523 INFO [main.serverMonitor] master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average load 14.0[10.10.1.63,55846,1276194933831] 2010-06-10 11:35:54,099 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:55,101 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items {code} The last lines are my own debug. Since we don't process the delayed todo if ROOT isn't online, we'll never reassign the regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2790) Purge apache-forrest from TRUNK
[ https://issues.apache.org/jira/browse/HBASE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2790. -- Assignee: stack Fix Version/s: 0.21.0 Resolution: Fixed Committed. Removed the top-level docs dir (its generated). While here, removed building of test and source jars into -bin.tgz bundle. Purge apache-forrest from TRUNK --- Key: HBASE-2790 URL: https://issues.apache.org/jira/browse/HBASE-2790 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.21.0 Remove all of the apache-forrest dirs from TRUNK. We don't do apache-forrest any more. We use maven generating out site. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2791) Stop dumping exceptions coming from ZK and do nothing about them
Stop dumping exceptions coming from ZK and do nothing about them Key: HBASE-2791 URL: https://issues.apache.org/jira/browse/HBASE-2791 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Fix For: 0.21.0 I think this is part of the Master/ZooKeeper refactoring project but I'm putting it up here to be sure we cover it. Currently in ZKW (and other places around the code base) we do ZK operations and we don't really handle the exceptions, for example in ZKW.setClusterState: {code} } catch (InterruptedException e) { LOG.warn( + instanceName + + Failed to set state node in ZooKeeper, e); } catch (KeeperException e) { if(e.code() == KeeperException.Code.NODEEXISTS) { LOG.debug( + instanceName + + State node exists.); } else { LOG.warn( + instanceName + + Failed to set state node in ZooKeeper, e); } {code} This has been always like that since we started using ZK. What if the session was expired? What if it was only the connection that had a blip? Do we handle it correctly? We need to have this discussion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2792) Create a better way to chain log cleaners
Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Fix For: 0.21.0 From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting
[ https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-2707: - Attachment: 2707-test.txt Test that puts off the processing of the shutdown of the server that was carrying root. This test never completes. With the patch in place, it does. Can't recover from a dead ROOT server if any exceptions happens during log splitting Key: HBASE-2707 URL: https://issues.apache.org/jira/browse/HBASE-2707 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: stack Priority: Blocker Fix For: 0.21.0 Attachments: 2707-test.txt, HBASE-2707.patch There's an almost easy way to get stuck after a RS holding ROOT dies, usually from a GC-like event. It happens frequently to my TestReplication in HBASE-2223. Some logs: {code} 2010-06-10 11:35:52,090 INFO [master] wal.HLog(1175): Spliting is done. Removing old log dir hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 2010-06-10 11:35:52,095 WARN [master] master.RegionServerOperationQueue(183): Failed processing: ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed todo queue java.io.IOException: Cannot delete: hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298) at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456) Caused by: java.io.IOException: java.io.IOException: /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty 2010-06-10 11:35:52,097 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,098 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,523 INFO [main.serverMonitor] master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average load 14.0[10.10.1.63,55846,1276194933831] 2010-06-10 11:35:54,099 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:55,101 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items {code} The last lines are my own debug. Since we don't process the delayed todo if ROOT isn't online, we'll never reassign the regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2707) Can't recover from a dead ROOT server if any exceptions happens during log splitting
[ https://issues.apache.org/jira/browse/HBASE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-2707. -- Hadoop Flags: [Reviewed] Resolution: Fixed Committed. Thanks for review J-D (I removed DELAY altogether). Can't recover from a dead ROOT server if any exceptions happens during log splitting Key: HBASE-2707 URL: https://issues.apache.org/jira/browse/HBASE-2707 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: stack Priority: Blocker Fix For: 0.21.0 Attachments: 2707-test.txt, HBASE-2707.patch There's an almost easy way to get stuck after a RS holding ROOT dies, usually from a GC-like event. It happens frequently to my TestReplication in HBASE-2223. Some logs: {code} 2010-06-10 11:35:52,090 INFO [master] wal.HLog(1175): Spliting is done. Removing old log dir hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 2010-06-10 11:35:52,095 WARN [master] master.RegionServerOperationQueue(183): Failed processing: ProcessServerShutdown of 10.10.1.63,55846,1276194933831; putting onto delayed todo queue java.io.IOException: Cannot delete: hdfs://localhost:55814/user/jdcryans/.logs/10.10.1.63,55846,1276194933831 at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1179) at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:298) at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:149) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:456) Caused by: java.io.IOException: java.io.IOException: /user/jdcryans/.logs/10.10.1.63,55846,1276194933831 is non empty 2010-06-10 11:35:52,097 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,098 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:53,523 INFO [main.serverMonitor] master.ServerManager$ServerMonitor(131): 1 region servers, 1 dead, average load 14.0[10.10.1.63,55846,1276194933831] 2010-06-10 11:35:54,099 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items 2010-06-10 11:35:55,101 DEBUG [master] master.RegionServerOperationQueue(126): -ROOT- isn't online, can't process delayedToDoQueue items {code} The last lines are my own debug. Since we don't process the delayed todo if ROOT isn't online, we'll never reassign the regions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882779#action_12882779 ] stack commented on HBASE-50: I'll make a branch to host Li's work going forward. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.