Would this question be better suited for the developer mailing list? Adrian
On Tue, Oct 8, 2013 at 7:01 PM, Adrian Sandulescu < [email protected]> wrote: > Also, here are the files in S3: > $ hadoop fs -ls > s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export > /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 > Found 1 items > -rwxrwxrwx 1 741047906 2013-10-08 13:45 > s3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export > /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 > > $ hadoop fs -ls > s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export > /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 > Found 1 items > -rwxrwxrwx 1 741047906 1970-01-01 00:00 > s3://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export > /hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 > > Thank you, > Adrian > > > On Tue, Oct 8, 2013 at 6:44 PM, Adrian Sandulescu < > [email protected]> wrote: > >> Yes, I was just digging. >> >> From a successful s3n:// import >> >> 2013-10-08 14:57:04,816 INFO >> org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file >> input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 >> >> output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 >> 2013-10-08 14:57:04,965 INFO >> org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening >> 's3n://AKIAIWNEBAESDM4DKBGA:UEFai8K1IBrjOKlXB2hbIbTFDJ8apubuA01LCc02@hbase-export/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' >> for reading >> 2013-10-08 14:57:05,039 INFO >> org.apache.hadoop.fs.s3native.NativeS3FileSystem: Opening key >> 'hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1' >> for reading at position '0' >> 2013-10-08 14:57:05,299 INFO >> org.apache.hadoop.hbase.snapshot.ExportSnapshot: Skip copy >> v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 >> to >> hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1, >> same file. >> 2013-10-08 14:57:05,300 INFO >> org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy completed for >> input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 >> >> output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 >> >> From a failed s3:// import >> >> 2013-10-08 15:27:21,810 INFO >> org.apache.hadoop.hbase.snapshot.ExportSnapshot: copy file >> input=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 >> >> output=hdfs://mycluster:8020/hbase/.archive/campaign_digital_ad_time_dev/d9e6cff519bdd232d7a7f8eb676d92ee/v/6912d21c32754e7cb34ac85952a67ce1 >> 2013-10-08 15:27:21,834 ERROR >> org.apache.hadoop.hbase.snapshot.ExportSnapshot: Unable to open source >> file=v/campaign_digital_ad_time_dev=d9e6cff519bdd232d7a7f8eb676d92ee-6912d21c32754e7cb34ac85952a67ce1 >> java.io.IOException: No such file. >> at org.apache.hadoop.fs.s3.S3FileSystem.checkFile(S3FileSystem.java:181) >> at org.apache.hadoop.fs.s3.S3FileSystem.open(S3FileSystem.java:246) >> at >> org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.tryOpen(FileLink.java:289) >> at >> org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:120) >> at >> org.apache.hadoop.hbase.io.FileLink$FileLinkInputStream.<init>(FileLink.java:111) >> at org.apache.hadoop.hbase.io.FileLink.open(FileLink.java:390) >> at >> org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.openSourceFile(ExportSnapshot.java:302) >> at >> org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.copyFile(ExportSnapshot.java:175) >> at >> org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:146) >> at >> org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.map(ExportSnapshot.java:95) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> >> Thank you, >> Adrian >> >> >> On Tue, Oct 8, 2013 at 5:56 PM, Ted Yu <[email protected]> wrote: >> >>> bq. 13/10/08 13:32:01 INFO mapred.JobClient: MISSING_FILES=1 >>> >>> Are you able to provide more context from job output ? >>> >>> Thanks >>> >>> >>> On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu < >>> [email protected]> wrote: >>> >>> > Hello everyone, >>> > >>> > I'm using this tool to export and "import" snapshots from S3: >>> > >>> > >>> https://github.com/lospro7/snapshot-s3-util/blob/master/src/main/java/com/imgur/backup/SnapshotS3Util.java >>> > >>> > I'm using this tool because it seems like a better option than >>> ExportTable, >>> > considering there isn't another HDFS cluster on hand. >>> > >>> > It uses the following trick to make exportSnapshot "import" from S3 to >>> the >>> > local HDFS. >>> > >>> > // Override dfs configuration to point to S3 >>> > config.set("fs.default.name", s3protocol + accessKey + ":" >>> > + accessSecret + "@" + bucketName); >>> > config.set("fs.defaultFS", s3protocol + accessKey + ":" + >>> > accessSecret + "@" + bucketName); >>> > config.set("fs.s3.awsAccessKeyId", accessKey); >>> > config.set("fs.s3.awsSecretAccessKey", accessSecret); >>> > config.set("hbase.tmp.dir", "/tmp/hbase-${user.name}"); >>> > config.set("hbase.rootdir", s3Url); >>> > >>> > >>> > Imports work great, but only when using the s3n:// protocol (which >>> means >>> > and HFile limit of 5GB). >>> > When using the s3:// protocol, I get the following: >>> > 13/10/08 13:32:01 INFO mapred.JobClient: MISSING_FILES=1 >>> > >>> > The author said he wasn't able to debug it and just uses s3n:// until >>> it >>> > becomes a problem. >>> > >>> > Has anyone encountered this when using exportSnapshot? >>> > Can you please point me in the right direction? >>> > >>> > Adrian >>> > >>> >> >> >
