[ https://issues.apache.org/jira/browse/HAWQ-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624133#comment-15624133 ]
Lili Ma edited comment on HAWQ-1034 at 11/1/16 2:44 AM: -------------------------------------------------------- Repair mode can be thought of particular case of force mode. 1) Force mode registers the files according to yaml configuration file, erase all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement catalog insert. It requires HDFS files for the table be included in yaml configuation file. 2) Repair mode also registers files according to yaml configuration file, erase the catalog records and re-insert. But it doesn't require all the HDFS files for the table be included in yaml configuration file. It will directly delete those files which are under the table directory but not included in yaml configuration file. Since repair mode may directly deleting HDFS files, say, if user uses repair mode by mistake, his/her data may be deleted, it may bring some risks. We can allow them to use force mode, and throw error for files under the directory but not included in yaml configuration file. If user does think the files are unnecessary, he/she can delete the files by himself/herself. The workaround for supporting repair mode use --force option: 1) If there is no added files since last checkpoint where the yaml configuration file is generated, force mode can directly handle it. 2) If there are some added files since last checkpoint which the user does want to delete, we can output those file information in force mode so that users can delete those files by themselves and then do register force mode again. Since we can use force mode to implement repair feature, we will remove existing code for repair mode and close this JIRA. Thanks was (Author: lilima): Repair mode can be thought of particular case of force mode. 1) Force mode registers the files according to yaml configuration file, erase all the records in catalog (pg_aoseg.pg_aoseg(paqseg)_$relid) and re-implement catalog insert. It requires HDFS files for the table be included in yaml configuation file. 2) Repair mode also registers files according to yaml configuration file, erase the catalog records and re-insert. But it doesn't require all the HDFS files for the table be included in yaml configuration file. It will directly delete those files which are under the table directory but not included in yaml configuration file. I'm a little concerned about directly deleting HDFS files, say, if user uses repair mode by mistake, his/her data may be deleted. So, what if we just allow them to use force mode, and throw error for files under the directory but not included in yaml configuration file. If user does think the files are unnecessary, he/she can delete the files by himself/herself. The workaround for supporting repair mode use --force option: 1) If there is no added files since last checkpoint where the yaml configuration file is generated, force mode can directly handle it. 2) If there are some added files since last checkpoint which the user does want to delete, we can output those file information in force mode so that users can delete those files by themselves and then do register force mode again. Since we can use force mode to implement repair feature, we will remove existing code for repair mode and close this JIRA. Thanks > add --repair option for hawq register > ------------------------------------- > > Key: HAWQ-1034 > URL: https://issues.apache.org/jira/browse/HAWQ-1034 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools > Affects Versions: 2.0.1.0-incubating > Reporter: Lili Ma > Assignee: Chunling Wang > Fix For: 2.0.1.0-incubating > > > add --repair option for hawq register > Will change both file folder and catalog table pg_aoseg.pg_paqseg_$relid to > the state which .yml file configures. Note may some new generated files since > the checkpoint may be deleted here. Also note the all the files in .yml file > should all under the table folder on HDFS. Limitation: Do not support cases > for hash table redistribution, table truncate and table drop. This is for > scenario rollback of table: Do checkpoints somewhere, and need to rollback to > previous checkpoint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)