Thanks Geoff.  Mind making a JIRA and attaching the code as a patch?
Copying and pasting from email might not work so well.  Thanks boss,
St.Ack

On Mon, Oct 31, 2011 at 10:46 AM, Geoff Hendrey <[email protected]> wrote:
> Hi Guys -
>
> This is a fairly complete little Tool (Configured) whose purpose is to move 
> out a whole slew of regions into a backup directory and restore .META. when 
> done. We found that we needed to do this when a huge volume of keys had been 
> generated into a production table, and it turned out the whole set of keys 
> had an incorrect prefix. Thus, what we really wanted to do was move the data 
> out of all the regions into some backup directory in one fell swoop. This 
> tool accepts some parameters with -D (hadoop arguments). It will remove a 
> slew of contiguous regions, relink the .META., and place the removed data in 
> a backup directory in HDFS. It has been tested on big tables and includes 
> some more subtle "gotchas" catches like being careful when parsing region 
> names, to guard against rowkeys actually containing commas. It worked for me, 
> but use at your own risk.
>
> Basically you give it -Dregion.remove.regionname.start=STARTREGION and 
> region.remove.regionname.end=ENDREGION and all the data between STARTREGION 
> and ENDREGION will be moved out of your table, where STARTREGION and 
> ENDREGION are region names.
>
> import java.io.IOException;
> import java.io.InputStream;
> import java.util.Iterator;
> import java.util.logging.Level;
> import java.util.logging.Logger;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HConstants;
> import org.apache.hadoop.hbase.HRegionInfo;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.NotServingRegionException;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Put;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.ResultScanner;
> import org.apache.hadoop.hbase.client.Scan;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.FSUtils;
> import org.apache.hadoop.hbase.util.Writables;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
> /**
>  * @author ghendrey
>  */
> public class RemoveRegions extends Configured implements Tool {
>
>    public static void main(String[] args) throws Exception {
>        int exitCode = ToolRunner.run(new RemoveRegions(), args);
>        System.exit(exitCode);
>    }
>
>    private static void deleteMetaRow(HRegionInfo closedRegion, HTable 
> hMetaTable) throws IOException {
>        Delete del = new Delete(closedRegion.getRegionName()); //Delete the 
> original row from .META.
>        hMetaTable.delete(del);
>        System.out.println("Deleted the region's row from .META. " + 
> closedRegion.getRegionNameAsString());
>    }
>
>    private static HRegionInfo closeRegion(Result result, HBaseAdmin admin) 
> throws RuntimeException, IOException {
>
>        byte[] bytes = result.getValue(HConstants.CATALOG_FAMILY, 
> HConstants.REGIONINFO_QUALIFIER);
>        HRegionInfo closedRegion = Writables.getHRegionInfo(bytes);
>
>        try {
>            admin.closeRegion(closedRegion.getRegionName(), null); //. Close 
> the existing region if open.
>            System.out.println("Closed the Region " + 
> closedRegion.getRegionNameAsString());
>        } catch (Exception nse) {
>            System.out.println("Skipped closing the region because: " + 
> nse.getMessage());
>        }
>        return closedRegion;
>    }
>
>    private static HRegionInfo getRegionInfo(String exclusiveStartRegionName, 
> Configuration hConfig) throws IOException {
>        HTable readTable = new HTable(hConfig, Bytes.toBytes(".META."));
>        Get readGet = new Get(Bytes.toBytes(exclusiveStartRegionName));
>        Result readResult = readTable.get(readGet);
>        byte[] readBytes = readResult.getValue(HConstants.CATALOG_FAMILY, 
> HConstants.REGIONINFO_QUALIFIER);
>        HRegionInfo regionInfo = Writables.getHRegionInfo(readBytes); //Read 
> the existing hregioninfo.
>        System.out.println("got region info: " + regionInfo);
>        return regionInfo;
>    }
>
>    private static void createBackupDir(Configuration conf) throws IOException 
> {
>
>        String path = conf.get("region.remove.backupdir", "regionBackup-" + 
> System.currentTimeMillis());
>        Path backupDirPath = new Path(path);
>        FileSystem fs = backupDirPath.getFileSystem(conf);
>        FSUtils.DirFilter dirFilt = new FSUtils.DirFilter(fs);
>        System.out.println("creating backup dir: " + backupDirPath.toString());
>        fs.mkdirs(backupDirPath);
>    }
>
>    public int run(String[] strings) throws Exception {
>        try {
>            System.setProperty("javax.xml.parsers.DocumentBuilderFactory", 
> "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl");
>            Configuration conf = getConf();
>            Configuration hConfig = HBaseConfiguration.create(conf);
>            hConfig.set("hbase.zookeeper.quorum", 
> System.getProperty("hbase.zookeeper.quorum", 
> "doop2.dt.sv4.decarta.com,doop3.dt.sv4.decarta.com,doop4.dt.sv4.decarta.com,doop5.dt.sv4.decarta.com,doop7.dt.sv4.decarta.com,doop8.dt.sv4.decarta.com,doop9.dt.sv4.decarta.com,doop10.dt.sv4.decarta.com"));
>            HBaseAdmin admin = new HBaseAdmin(hConfig);
>            HBaseAdmin.checkHBaseAvailable(hConfig);
>
>
>            System.out.println("regions will be moved out from between 
> region.remove.regionname.start and region.remove.regionname.end (exclusive)");
>            String exclusiveStartRegionName = 
> conf.get("region.remove.regionname.start");
>            if (null == exclusiveStartRegionName) {
>                throw new RuntimeException("Current implementation requires an 
> exclusive region.remove.regionname.start");
>            }
>            System.out.println("region.remove.regionname.start=" + 
> exclusiveStartRegionName);
>            String exclusiveEndRegionName = 
> conf.get("region.remove.regionname.end");
>            if (null == exclusiveEndRegionName) {
>
>                throw new RuntimeException("Current implementation requires an 
> exclusive region.remove.endrow");
>            }
>            System.out.println("region.remove.regionname.end=" + 
> exclusiveEndRegionName);
>
>            //CREATE A BACKUP DIR FOR THE REGION DATA TO BE MOVED INTO
>            createBackupDir(hConfig);
>
>
>            Path hbaseRootPath = FSUtils.getRootDir(hConfig);
>            if (null == hbaseRootPath) {
>                throw new RuntimeException("couldn't determine hbase root 
> dir");
>            } else {
>                System.out.println("hbase rooted at " + 
> hbaseRootPath.toString());
>            }
>
>            HTable hMetaTable = new HTable(hConfig, Bytes.toBytes(".META."));
>            System.out.println("connected to .META.");
>
>            //get region info for start and end regions
>            HRegionInfo exclusiveStartRegionInfo = 
> getRegionInfo(exclusiveStartRegionName, hConfig);
>            HRegionInfo exclusiveEndRegionInfo = 
> getRegionInfo(exclusiveEndRegionName, hConfig);
>
>
>            //CLOSE all the regions starting with the exclusiveStartRegionName 
> (including it), and up to but excluding closing the exclusiveEndRegionName
>            //and DELETE rows from .META.
>            Scan scan = new Scan(Bytes.toBytes(exclusiveStartRegionName), 
> Bytes.toBytes(exclusiveEndRegionName));
>            ResultScanner metaScanner = hMetaTable.getScanner(scan);
>            int i = 0;
>            for (Iterator<Result> iter = metaScanner.iterator(); 
> iter.hasNext();) {
>                Result res = iter.next();
>                //CLOSE REGION
>                HRegionInfo closedRegion = closeRegion(res, admin);
>                //MOVE ACTUAL DATA OUT OF HBASE HDFS INTO BACKUP AREA
>                moveDataToBackup(closedRegion, hConfig);
>                //DELETE ROW FROM META TABLE
>                deleteMetaRow(closedRegion, hMetaTable);
>            }
>
>            //now reinsert the startrow into .META. with it's endrow pointing 
> to the startrow of the exclusiveEndRegionInfo
>            //This effectively "relinks" the link list of .META., now that all 
> the interstitial region-rows have been removed from .META.
>            relinkStartRow(exclusiveStartRegionInfo, exclusiveEndRegionInfo, 
> hConfig, admin);
>
>
>            return 0;
>
>        } catch (Exception ex) {
>            throw new RuntimeException(ex.getMessage(), ex);
>        }
>
>    }
>
>    private void relinkStartRow(HRegionInfo exclusiveStartRegionInfo, 
> HRegionInfo exclusiveEndRegionInfo, Configuration hConfig, HBaseAdmin admin) 
> throws IllegalArgumentException, IOException {
>        //Now we are going to recreate the region info for 
> exclusiveStartRegion, such that it's endKey points to the startKey
>        //of the exclusiveEndRegion.
>        HTableDescriptor descriptor = new 
> HTableDescriptor(exclusiveStartRegionInfo.getTableDesc()); //Use existing 
> hregioninfo htabledescriptor and this construction
>        // Just changing the End key , nothing else. This performs the 
> "unlink" step
>        byte[] startKey = exclusiveStartRegionInfo.getStartKey();
>        byte[] endKey = exclusiveEndRegionInfo.getStartKey();
>        HRegionInfo newStartRegion = new HRegionInfo(descriptor, startKey, 
> endKey);
>        byte[] value = Writables.getBytes(newStartRegion);
>        Put put = new Put(newStartRegion.getRegionName()); //  Same time stamp 
> from the record.
>        put.add(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER, 
> value); //Insert the new entry in .META. using new hregioninfo name as row 
> key and add an info:regioninfo whose contents is the serialized new 
> hregioninfo.
>        HTable metaTable = new HTable(hConfig, ".META.");
>        metaTable.put(put);
>        System.out.println("New row in .META.: " + 
> newStartRegion.getRegionNameAsString() + " End key is " + 
> Bytes.toString(exclusiveEndRegionInfo.getStartKey()));
>        admin.assign(newStartRegion.getRegionName(), true); //Assign the new 
> region.
>        System.out.println("Assigned the new region " + 
> newStartRegion.getRegionNameAsString());
>    }
>
>    private static void moveDataToBackup(HRegionInfo closedRegion, 
> Configuration conf) throws IOException {
>
>
>        Path rootPath = FSUtils.getRootDir(conf);
>        String tablename = closedRegion.getRegionNameAsString().split(",")[0]; 
> //split regionname on comma. tablename comes before first comma
>        Path tablePath = new Path(rootPath, tablename);
>        String[] dotSplit = closedRegion.getRegionNameAsString().split("\\.", 
> 0);
>        String regionId = dotSplit[dotSplit.length - 1]; //split regionname on 
> dot. regionId between last two dots
>        Path regionPath = new Path(tablePath, regionId);
>        System.out.println(regionPath);
>        FileSystem fs = FileSystem.get(conf);
>
>        Path regionBackupPath = new Path(conf.get("region.remove.backupdir", 
> "regionBackup-" + System.currentTimeMillis()) + "/" + regionId);
>
>        //Path regionBackupPath = new Path(backupPath, regionId);
>        System.out.println("moving to: " + regionBackupPath);
>        fs.rename(regionPath, regionBackupPath);
>
>    }
> }
>
> -----Original Message-----
> From: Stuart Smith [mailto:[email protected]]
> Sent: Saturday, October 29, 2011 1:39 PM
> To: [email protected]
> Subject: Re: PENDING_CLOSE for too long
>
> Hello Geoff,
>
>   I usually don't show up here, since I use CDH, and good form means I should 
> stay on CDH-users,
> But!
>   I've been seeing the same issues for months:
>
>  - PENDING_CLOSE too long, master tries to reassign - I see an continuous 
> stream of these.
>  - WrongRegionExceptions due to overlapping regions & holes in the regions.
>
> I just spent all day yesterday cribbing off of St.Ack's check_meta.rb script 
> to write a java program to fix up overlaps & holes in an offline fashion 
> (hbase down, directly on hdfs), and will start testing next week (cross my 
> fingers!).
>
> It seems like the pending close messages can be ignored?
> And once I test my tool, and confirm I know a little bit about what I'm 
> doing, maybe we could share notes?
>
> Take care,
>   -stu
>
>
>
> ________________________________
> From: Geoff Hendrey <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Sent: Saturday, September 3, 2011 12:11 AM
> Subject: RE: PENDING_CLOSE for too long
>
> "Are you having trouble getting to any of your data out in tables?"
>
> depends what you mean. We see corruptions from time to time that prevent
> us from getting data, one way or another. Today's corruption was regions
> with duplicate start and end rows. We fixed that by deleting the
> offending regions from HDFS, and running add_table.rb to restore the
> meta. The other common corruption is the holes in ".META." that we
> repair with a little tool we wrote. We'd love to learn why we see these
> corruptions with such regularity (seemingly much higher than others on
> the list).
>
> We will implement timeout you suggest, and see how it goes.
>
> Thanks,
> Geoff
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Friday, September 02, 2011 10:51 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: PENDING_CLOSE for too long
>
> Are you having trouble getting to any of your data out in tables?
>
> To get rid of them, try restarting your master.
>
> Before you restart your master, do "HBASE-4126  Make timeoutmonitor
> timeout after 30 minutes instead of 3"; i.e. set
> "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in
> hbase-site.xml.
>
> St.Ack
>
> On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <[email protected]>
> wrote:
>> In the master logs, I am seeing "regions in transition timed out" and
>> "region has been PENDING_CLOSE for too long, running forced unasign".
>> Both of these log messages occur at INFO level, so I assume they are
>> innocuous. Should I be concerned?
>>
>>
>>
>> -geoff
>>
>>
>

Reply via email to