you just have to add the configuration property to enable the multipart in the -site.xml you can probably just pass it as -Dfs.s3n.multipart.uploads.enabled=true there is nothing to change in the tool. it is just a configuration option to enable. the s3n connector will pickup the configuration and use the multipart allowing you to transfer a single file > 5G
Matteo On Fri, Feb 5, 2016 at 9:16 AM, Vishnu Amdiyala <[email protected]> wrote: > I understand how multi part upload works when the file>5GB resides on HDFS > but I am doing something like this: > > /usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > table_2_2016020504 -copy-to s3n://${bucketname} > which triggers a MR job to put data into the bucket which is failing since > each Hfile>5GB from the table. Do I have to re-write the snapshot tool to > make use of the multipart upload API? > > Thanks! > Vishnu > > On Thu, Feb 4, 2016 at 7:54 PM, Matteo Bertozzi <[email protected]> > wrote: > > > there is nothing to split files in Export Snapshot because you don't need > > it. > > > > take a look at > > http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html > > "With a single PUT operation you can upload objects up to 5 GB in size" > > "Using the Multipart upload API you can upload large objects, up to 5 > TB." > > > > you just have to configure the s3 connector to use multipart. > > and you'll be able to upload files > 5G > > > > Matteo > > > > > > On Thu, Feb 4, 2016 at 7:50 PM, Vishnu Amdiyala < > [email protected]> > > wrote: > > > > > Thank you guys for the quick response. My question is how do I generate > > > part files out of these Hfiles to upload to S3 ? Export Snapshot tool > > which > > > I use doesn't allow more mappers than the number of files[correct me > if I > > > am wrong]. So, how will I be able to generate splits out of the each > bulk > > > file>5GB? > > > > > > > > > On Thu, Feb 4, 2016 at 7:14 PM, Ted Yu <[email protected]> wrote: > > > > > > > Vishnu: > > > > Please take a look > > > > at > > > hadoop-common-project/hadoop-common/src/main/resources/core-default.xml > > > > for multipart related config parameters (other than the one mentioned > > by > > > > Matteo): > > > > > > > > fs.s3n.multipart.uploads.block.size > > > > fs.s3n.multipart.copy.block.size > > > > > > > > Cheers > > > > > > > > On Thu, Feb 4, 2016 at 7:00 PM, Matteo Bertozzi < > > [email protected] > > > > > > > > wrote: > > > > > > > > > the multipart upload is on the s3 connector. > > > > > you can tune your connector, to use multipart > > > > > fs.s3n.multipart.uploads.enabled = true > > > > > > > > > > Matteo > > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 6:34 PM, Vishnu Amdiyala < > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I am trying to back up snapshots of Hbase table to S3 bucket of > > which > > > > > each > > > > > > Hfile is sized>5GB which fails due to S3 bucket's 5gb limitation. > > > The > > > > > > export snapshot's source says that the mappers are set to max of > > > total > > > > > > number of files. Is there a way to use this tool to split files > and > > > > > upload > > > > > > to S3 in parts? > > > > > > > > > > > > > > > > > > Thanks! > > > > > > Vishnu > > > > > > > > > > > > > > > > > > > > >
