Hi everyone,
We are currently working on upgrading from Hbase 0.94 to HBase 1.2. We use
TableMapReduceUtil.initTableSnapshotMapperJob to read snapshots that are
stored on s3 in a few hadoop jobs.
I am working on upgrading the jobs for the new snapshots to use the HBase
1.2 jar, and this is causing a few problems. In particular, I am having
problems with what seems like new code, saying that the restoreDir and
rootDir are in different filesystems. This is the code:
if(!restoreDir.getFileSystem(conf).getUri().equals(rootDir.getFileSystem(conf).getUri()))
{
throw new IllegalArgumentException("Filesystems for restore directory
and HBase root directory should be the same");
} else
if(restoreDir.toUri().getPath().startsWith(rootDir.toUri().getPath()))
{
throw new IllegalArgumentException("Restore directory cannot be a sub
directory of HBase root directory. RootDir: " + rootDir + ", restoreDir: "
+ restoreDir);
And this is the exception:
Exception in thread "main" java.lang.IllegalArgumentException: Filesystems
for restore directory and HBase root directory should be the same
at
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:716)
at
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:403)
at
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:205)
at
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:365)
Our restoreDir is on the hadoop cluster in HDFS, and the rootDir is on s3,
so the first exception is thrown. I also tried setting the restoreDir to be
on s3, but that caused another exception.
Exception in thread "main" java.io.IOException:
java.util.concurrent.ExecutionException:
java.lang.IllegalArgumentException: Wrong FS: s3n://..., expected:
hdfs://...
We didn't see this problem at all in the old jobs that read HBase .94
snapshots with the 94 jar, where the restoreDir was on hdfs and the rootDir
was on s3. All of the paths have remained unchanged. I noticed that the
docs for the last argument changed slightly,
tableRootDir - The directory where the temp table will be created
<https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTableSnapshotMapperJob(java.lang.String,%20org.apache.hadoop.hbase.client.Scan,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class,%20org.apache.hadoop.mapreduce.Job,%20boolean,%20org.apache.hadoop.fs.Path)>
to
tmpRestoreDir - a temporary directory to copy the snapshot files into.
Current user should have write permissions to this directory, and this
should not be a subdirectory of rootdir. After the job is finished, restore
directory can be deleted.
<https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html>
*What exactly has changed? The snapshots will continue to be stored in s3.
What can we do to make it so that they can be read by this method?*
Laura