I was able to successfully export data to S3 using below command. hbase org.apache.hadoop.hbase.mapreduce.Export docs s3n://KEY:ACCESSKEY@fdocshbase/data/bkp1 1440760612 1440848237
and I was able to import data to a new table (after creation) with command hbase org.apache.hadoop.hbase.mapreduce.Import docsnew s3n://KEY:ACCESSKEY@fdocshbase/data/bkp1 I exported with different time ranges to other directories like bkp2 bkp3. When I was trying to import all directories into Hbase, getting a filenotfound exception. [hdfs@ip-172-31-59-10 ~]$ hbase org.apache.hadoop.hbase.mapreduce.Import docsnew s3n://KEY:ACCESSKEY@fdocshbase SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2015-08-29 09:21:02,058 INFO [main] impl.TimelineClientImpl: Timeline service address: http://ip-XXXXXXXXX.ec2.internal:8188/ws/v1/timeline/ 2015-08-29 09:21:02,214 INFO [main] client.RMProxy: Connecting to ResourceManager at ip-XXXXXXXXX.ec2.internal/XXXXXXXXX:8050 2015-08-29 09:21:03,961 INFO [main] input.FileInputFormat: Total input paths to process : 1 2015-08-29 09:21:04,021 INFO [main] mapreduce.JobSubmitter: Cleaning up the staging area /user/hdfs/.staging/job_1440846401761_0026 Exception in thread "main" java.io.FileNotFoundException: No such file or directory 's3n://XXXXXXX:YYYYYYYYYYY@fdocshbase/data/data' at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:507) at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:67) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at org.apache.hadoop.hbase.mapreduce.Import.main(Import.java:544) Its looking for data directory always? I did export data to HDFS as well. Had a similar issue in HDFS as well. I created bakcup directory like /fdocshbase/bkp1, /fdocshbase/bkp2 and so on.... This created similar problem above while importing. But, When I created directories like /fdocshbase/data/bkp1, /fdocshbase/data/bkp2......and provide /fdocshbase/data/ as input to import, it works. But, this is not working on S3. I tried creating directory like fdocshbase/data/data and placed all bkp* directories in it. Any thoughts? Any quick resolutions? Regards, Arun
