[jira] Assigned: (HBASE-2792) Create a better way to chain log cleaners
[ https://issues.apache.org/jira/browse/HBASE-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin reassigned HBASE-2792: -- Assignee: Li Chongxin Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Li Chongxin Fix For: 0.90.0 From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2792) Create a better way to chain log cleaners
[ https://issues.apache.org/jira/browse/HBASE-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891177#action_12891177 ] Li Chongxin commented on HBASE-2792: here is some discussion about this issue in private mail: bq. My idea is to replace the LogCleanerDelegate in OldLogsCleaner with a list of LogCleanerDelegate. A new LogCleanerDelegate is added to the list if it is required. Then the log file is checked against each LogCleanerDelegate in the list. It is deletable only if all the LogCleanerDelegate pass. For later dev who wants to provide a new LogCleanerDelegate, all he has to do is to implement the LogCleanerDelegate and add it to the list. From Jean-Daniel's reply: bq. I think what you described sounds good, but I would add the requirement that anybody should be able to add their own implementations without changing the code (the new class would need to be on HBase's classpath). I think this could be done via the configuration file, like a comma separated list of fully qualified class names. Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Li Chongxin Fix For: 0.90.0 From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2792) Create a better way to chain log cleaners
[ https://issues.apache.org/jira/browse/HBASE-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-2792: --- Attachment: HBASE-2792.patch Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Li Chongxin Fix For: 0.90.0 Attachments: HBASE-2792.patch From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2792) Create a better way to chain log cleaners
[ https://issues.apache.org/jira/browse/HBASE-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891462#action_12891462 ] Li Chongxin commented on HBASE-2792: SnapshotLogCleaner is not included in this patch because it is dependent on the snapshot of hbase (https://issues.apache.org/jira/browse/HBASE-50), which is coming soon. Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Li Chongxin Fix For: 0.90.0 Attachments: HBASE-2792.patch From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2792) Create a better way to chain log cleaners
[ https://issues.apache.org/jira/browse/HBASE-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-2792: --- Attachment: (was: HBASE-2792.patch) Create a better way to chain log cleaners - Key: HBASE-2792 URL: https://issues.apache.org/jira/browse/HBASE-2792 Project: HBase Issue Type: Improvement Reporter: Jean-Daniel Cryans Assignee: Li Chongxin Fix For: 0.90.0 From Stack's review of HBASE-2223: {quote} Why this implementation have to know about other implementations? Can't we do a chain of decision classes? Any class can say no? As soon as any decision class says no, we exit the chain So in this case, first on the chain would be the ttl decision... then would be this one... and third would be the snapshotting decision. You don't have to do the chain as part of this patch but please open an issue to implement. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (HBASE-2744) Start Snapshot via ZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-2744 started by Li Chongxin. Start Snapshot via ZooKeeper Key: HBASE-2744 URL: https://issues.apache.org/jira/browse/HBASE-2744 Project: HBase Issue Type: Sub-task Components: master, regionserver Reporter: Li Chongxin Assignee: Li Chongxin -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883779#action_12883779 ] Li Chongxin commented on HBASE-50: -- bq. isSnapshot in HRI? bq. Will keeping snapshot data in .META. work? .META. is by region but regions are deleted after a split but you want your snapshot to live beyond this? Snapshot data, actually the reference count of hfiles, will be kept in .META. table, but in a different row than the original region row. So these reference count information will not be deleted after a split. Reference count information is saved here because it is also in a region centric view. Reference count information of a region's hfiles are kept together in a row in .META. no matter this hfile is still in use or has been archived. I described this in the Appendix A. of the document. bq. In zk, writeZnode and readZnode ain't the best names for methods... what kinda znodes are these? (Jon says these already exist, that they are not your fault) Actually the method names for snapshot are startSnapshotOnZK, abortSnapshotOnZK, registerRSForSnapshot in ZooKeeperWrapper. I put writeZnode and readZnode in the diagram because I think I can use them inside the above methods. Do you think we should make writeZnode and readZnode private and just use them inside ZooKeeperWrapper? bq. Can you make a SnapShot class into which encapsulate all related to snapshotting rather than adding new data members to HMaster? Maybe you do encapsulate it all into snapshotmonitor? I haven't figured out all the data members in the design. I will create a Snapsnot class to encapsulate the related fields if necessary during implementation. bq. Can you call RSSnapshotHandler just SnapshotHandler? sure bq. You probably don't need to support String overloads. You mean methods in HBaseAdmin? A repository has been created in github with the initial content of hbase/trunk http://github.com/lichongxin/hbase-snapshot Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: Snapshot Class Diagram.png Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, Snapshot Class Diagram.png, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: HBase Snapshot Implementation Plan.pdf HBase Snapshot Implementation Plan describes the classes and methods that are going to be created and modified to support snapshot. Go over the document with the class diagram. Any comments are welcome! Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: (was: snapshot-src.zip) Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class Diagram.png Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880454#action_12880454 ] Li Chongxin commented on HBASE-50: -- Sure. We do need a branch for snapshot. Currently I'm working on TRUNK. Once the stuff is ready, I think we can create a new feature branch for commit. What do you think? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880093#action_12880093 ] Li Chongxin commented on HBASE-50: -- bq. Fail with a warning. A nice-to-have would be your suggestion of restoring snapshot into a table named something other than the original table's name (Fixing this issue is low-priority IMO). bq. .. it's a good idea to allow snapshot restore to a new table name while the original table is still online. And the restored snapshot should be able to share HFiles with the original table I will make this issue a low-priority sub-task. One more question, besides metadata and log file, what else data should take care to rename the snapshot to a new table name? Are there any other files (e.g. HFiles) containing table name? bq. ... didn't we discuss that .META. might not be the place to keep snapshot data since regions are deleted when the system is done w/ them (but a snapshot may outlive a particular region). I misunderstood... I thought you were talking about create a new catalog table 'snapshot' to keep the metadata of snapshots, such as creation time. In current design, a region will not be delete if it is still used by a snapshot, even if the system has done with it. This region would be probably marked as 'deleted' in .META. This is discussed in section 6.2, 6.3 and no new catalog table is added. Do you think it is appropriate to keep metadata in .META. for a deleted region? Do we still need a new catalog table? bq. rather than causing all of the RS to roll the logs, they could simply record the log sequence number of the snapshot, right? This will be a bit faster to do and causes even less of a hiccup in concurrent operations (and I don't think it's any more complicated to implement, is it?) Yes, sounds good. The log sequence number should also be included when the logs are split because log files would contain the data both before and after the snapshot, right? bq. Making the client orchestrate the snapshot process seems a little strange - could the client simply initiate it and put the actual snapshot code in the master? I think we should keep the client as thin as we can Ok, This will change the design a little. bq. I'd be interested in a section about failure analysis - what happens when the snapshot coordinator fails in the middle? .. That will be great! Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2745) Create snapshot of an HBase table
Create snapshot of an HBase table - Key: HBASE-2745 URL: https://issues.apache.org/jira/browse/HBASE-2745 Project: HBase Issue Type: Sub-task Components: master, regionserver Reporter: Li Chongxin Assignee: Li Chongxin Create snapshot of an HBase table under directory '.snapshot' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2746) Existing functions of HBase should be modified to maintain snapshots data
Existing functions of HBase should be modified to maintain snapshots data - Key: HBASE-2746 URL: https://issues.apache.org/jira/browse/HBASE-2746 Project: HBase Issue Type: Sub-task Components: master, regionserver Reporter: Li Chongxin Assignee: Li Chongxin Existing functions of HBase, e.g. compaction, split, table delete, meta scanner, should be modified for the consideration of snapshot data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-50 started by Li Chongxin. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2748) Restore snapshot to a new table name other than the original table name
Restore snapshot to a new table name other than the original table name --- Key: HBASE-2748 URL: https://issues.apache.org/jira/browse/HBASE-2748 Project: HBase Issue Type: Sub-task Reporter: Li Chongxin Assignee: Li Chongxin Priority: Minor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-2749) Export and Import a snapshot
Export and Import a snapshot Key: HBASE-2749 URL: https://issues.apache.org/jira/browse/HBASE-2749 Project: HBase Issue Type: Sub-task Reporter: Li Chongxin Assignee: Li Chongxin Priority: Minor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: HBase Snapshot Design Report V3.pdf Design document has been updated based on the discussion. Following changes have been made: * Requirements have been updated * Snapshot can now be created for both online (enabled) tables and offline (disabled) tables. For offline table, snapshot is performed by the master * Metadata for the table is not copied from .regioninfo any more but totally dumped from .META. * WAL logs are now archived instead of deleted, so snapshot does not copy the log files any more but take a file that lists the log names. A new section 6.5 is added on log maintenance * Rename 'reference' family in .META. to 'snapshot' * Add the same column family 'snapshot' to -ROOT- so that .META. can be snapshot too * A new file .snapshotinfo is created under each snapshot dir to keep the meta information of snapshot. List operation for snapshots will read the this meta file. * A new operation 'Restore' is added to restore a table from a snapshot on the same data center * Export and import are changed. Export and import are used to export a snapshot to or imort a snapshot from other data centers. Therefore, exported snapshot has the same file format as how a table is exported so that we can treat exported snapshot the same as exported table and import the exported snapshot with the same import facility. Pending Questions: 1. What if the table with the same name is still online when we want to restore a snapshot? There will be a name collision in both HDFS and .META. ; We should not touch the existing table, right? 2. Then shall we allow rename the snapshot as a new table name? For example the snapshot is created for table table1, can we restore the snapshot as table2? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot Design Report V3.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876699#action_12876699 ] Li Chongxin commented on HBASE-50: -- bq. ... but also after snapshot is done your design should include description of how files are archived, rather than deleted... Are you talking about files that are no longer used by hbase table but are referenced by snapshot? I think this has been described in chapter 6 'Snapshot Maintenance'. For example, hfiles are archived in delete directory. And section 6.4 describes how these files will be cleaned up. bq. ..In fact you'll probably be doing a snapshot of at least a subset of .META. on every table snapshot I'd imagine - at least the entries for the relevant table. .META. entries for the snapshot table have been dumped, haven't they? Why we still need a snapshot of a subset of .META.? bq. So, do you foresee your restore-from-snapshot running split over the logs as part of the restore? That makes sense to me. Yes, restore-from-snapshot has to run split over the WAL logs. It will take some time. So restore-from-snapshot will not be very fast. bq. Why you think we need a Reference to the hfile? Why not just a file that lists the names of all the hfiles? We don't need to execute the snapshot, do we? Restoring from a snapshot would be a bunch of file renames and wal splitting? At first I thought snapshot probably should keep the table directory structure for the later use. For example, a reader like HalfStoreFileReader could be provided so that we could read from the snapshot directly. But yes, we actually don't execute the snapshot. So keeping a list of all the hfiles (actually one list per RS, right?) should be enough. And also restroing from snapshot is not just file renames. Since a hfile might be referenced by several snapshot, we should probably do real copy when restroing, right? bq. Shall we name the new .META. column family snapshot rather than reference? sure bq. On the filename '.deleted', I think it a mistake to give it a '.' prefix especially given its in the snapshot dir... Ok, I will rename the snapshot dir as '.snapshot'. For dir '.deleted', what name do you think we should use? Because there might be several snapshots under the dir '.snapshot', each has a snapshot name, I name this dir as '.deleted' to discriminate it from a snapshot name. bq. Do you need a new catalog table called snapshots to keep list of snapshots, of what a snapshot comprises and some other metadata such as when it was made, whether it succeeded, who did it and why? It'll be much more convenient if a catalog table 'snapshot' can be created. Will this impact normal operation of hbase? bq. Section 7.4 is missing split of WAL files. Perhaps this can be done in a MR job? I'll add the split of WAL logs. Yes, a MR job can be used. Which method do you think is better? Read from the imported file and inserted into the table by hbase api. Or just copy the hfile into place and update the .META.? bq. Lets not have the master run the snapshot... let the client run it? bq. Snapshot will be doing same thing whether table is partially online or not.. I put these two issues together because I think they are correlative. In current design, if a table is opened, snapshot will be performed by each RS which serves tha table regions. Otherwise, if a table is closed, snapshot will be performed by the master because the table is not served by any RS. For the first comment, it is talking about closed table. So master will perform the snapshot because client does not have access to underlying dfs. For the second one, I was thinking if a table is partially online, table regions might be partially served by RS and partially offline, right? Then who will perform the snapshot? If RS, the regions that are offline will be missed. If the master, regions that are online might lose data in memstore. I'm confused.. bq. It's a synchronous way. Do you think this is appropriate? Yes. I'm w/ JG on this. This is another problem confusing me..In current design (which is a synchronous way), a snapshot is started when all the RS are ready for snapshot. Then all RS perform snapshot concurrently. This guarantees snapshot is not started if one RS fails. If we switch to an asynchronous approach. Should the RS start snapshot immediately when it is ready? Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a
[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876259#action_12876259 ] Li Chongxin commented on HBASE-50: -- @Stack, Thanks for the comments. Here are some replies and questions for the comments. .bq + I don't think you should take on requirement 1), only the hbase admin can create a snapshot. There is no authentication/access control in hbase currently - its coming but not here yet - and without it, this would be hard for you to enforce. I think I didn't state it properly. I know access control is not included in hbase currently. What I mean here is, snapshot should be put in class HBaseAdmin instead of HTable. Client side operations being divided into these two classes is also for the consideration of access control which is provided in the future, isn't it? .bq + Regards requirement 2., I'd suggest that how the snapshot gets copied out from under hbase should also be outside the scope of your work. I'd say your work is making a viable snapshot that can be copied with perhaps some tests to prove it works - that might copy off data - but in general, i'd say how actual copying is done is outside of the scope of this issue. Strictly, requirement 2 is not about how snapshot is copied out from under hbase. Actually, table data is not really copied when snapshot in current design. To make it fast, snapshot just captures the state of the table especially all the table files. So for requirement 2, just make sure the table data (hfiles indeed) are not mutated when snapshot. bq. + How you going to ensure tabke is in 'good status'. Can you not snapshot it whatever its state? All regions being on line is a requirement? Regarding tables that are disabled, all regions being on line should not be a requirement. As for 'good status', what I'm thinking is a table region could be in PENDING_OPEN or PENDING_CLOSE state, in which it might be half opened. I'm not sure wether RS or the master should take on the responsibility to perform the snapshot at this time. On the other side, if the table is completely opened or closed, snapshot can be taken by RS or the master. bq. + FYI, wal logs are now archived, not deleted. Replication needs them. Replication might also be managing clean up of the archives (j-d, whats the story here?) If an outstanding snapshot, one that has not been deleted, then none of its wals should be removed. Great. In current design, WAL log files are the only data files that are really copied. If they are now archived instead of deleted, we can create log files reference just as hfiles instead of copying the actual data. This will further shorten the snapshot time. Another LogCleanerDelegate, say ReferencedLogCleaner, could be created to check whether the log file should be deleted for the consideration of snapshot. What do you think? bq. + I can say 'snapshot' all tables? Can I say 'snapshot catalog tables - meta and root tables?' I think snapshot for meta works fine but snapshot for root table is a little tricky. When the snapshot is performed for a user table, .META. is updated to keep track of the file references. If a .META. table is snapshot, -ROOT- can be update to keep track of the file references. But where to keep the file references for -ROOT- table(region) if it is snapshot, still in -ROOT-? Should these newly updated file references information also be included in the snapshot? bq. + If a RS fails between 'ready' and 'finish', does this mean we abandon the snapshot? Yes. If a RS fails between 'ready' and 'finish', it should notify the client or master, whichever orchestrates, then the client or the master will send a signal to stop the snapshot on all RS via ZK. Something like this. bq. + I'd say if RS is not ready for snapshot, just fail it. Something is badly wrong is a RS can't snapshot. Currently, there is a timeout for snapshot ready. If a RS is ready, it'll wait for all the RS to be ready. Then the snapshot starts on all RS. Otherwise, the ready RS timeout and snapshot does not start on any RS. It's a synchronous way. Do you think this is appropriate? Will it create too much load to perform snapshot concurrently on the RS? (Jonathan perfer an asynchronous method) bq. + Would it make sense for there to be a state between ready and finish and the data in this intermediate state would be the RS's progress? Do you mean a znode is create for each RS to keep the progress? Then how do you define the RS's progress? What data will be kept in this znode? Thanks again for the comments. I will update the design document based on them. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: (was: snapshot-flowchart.png) Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-50: - Attachment: HBase Snapshot Design Report V2.pdf Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-2637) Web UI error if the table is disabled
[ https://issues.apache.org/jira/browse/HBASE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Chongxin updated HBASE-2637: --- Attachment: hbase_issue.png However, if the table is disabled after HBase has started, everything works fine. Web UI error if the table is disabled - Key: HBASE-2637 URL: https://issues.apache.org/jira/browse/HBASE-2637 Project: HBase Issue Type: Bug Affects Versions: 0.20.4 Reporter: Li Chongxin Priority: Minor Attachments: hbase_issue.png If a table is disabled when HBase is started up, then HTTP ERROR 500 will happen when view this table in table.jsp -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.