Ok... So when the basic tools don't work... How about roll your own?
Step 1 take a snapshot and write the file(s) to a different location outside of /hbase. (Export to local disk on the cluster) Step 2 write your own M/R job and control the number of mappers who read from HDFS and write to S3. Assuming you want a block for block match. If you want to change the #files since each region would be a separate file, you could do the write to S3 in the reduce phase. (Which is what you want.) On Jun 4, 2014, at 7:39 AM, Damien Hardy <[email protected]> wrote: > Hello, > > We are trying to export HBase table on S3 for backup purpose. > By default export tool run a map per region and we want to limit output > bandwidth on internet (to amazon s3). > > We were thinking in adding some reducer to limit the number of writers > but this is explicitly hardcoded to 0 in Export class > ``` > // No reducers. Just write straight to output files. > job.setNumReduceTasks(0); > ``` > > Is there an other way (propertie?) in hadoop to limit output bandwidth ? > > -- > Damien > The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com
