Have a look at the code in HFileOutputFormat2#configureIncrementalLoad(Job, HTableDescriptor, RegionLocator). The HTableDescriptor is only used for details of writing blocks: compression, bloom filters, block size, and block encoding. Other table properties are left to the online table.
I'm not sure what you're trying to accomplish by setting this value in your job. The size of the HFiles produced will be dependent on the data distribution characteristics of your MR job, not the online table. When you completeBulkLoad, those generated HFiles will be split according to the online table region boundaries, and loaded into the Regions. Once that happens, usual region lifecycle stuff picks up, meaning at that point, the online region will decide if/when to split based on store sizes. Hope that helps, -n On Fri, Jan 23, 2015 at 10:29 AM, Ted Yu <[email protected]> wrote: > Suppose the value used by bulk loading is different from that used by > region server, how would region server deal with two (or more) values > w.r.t. splitting ? > > Cheers > > On Fri, Jan 23, 2015 at 10:15 AM, Tom Hood <[email protected]> wrote: > > > Hi, > > > > I'm bulkloading into an empty hbase table and have called > > HTableDescriptor.setMaxFileSize to override the global setting of > > HConstants.HREGION_MAX_FILESIZE (i.e. hbase.hregion.max.filesize). > > > > This newly configured table is then passed to > > HFileOutputFormat2.configureIncrementalLoad to setup the MR job to > generate > > the hfiles. This already configures other properties based on the > settings > > of the table it's given (e.g. compression, bloom, data encoding, splits, > > etc). Is there a reason it does not also configure the > > HREGION_MAX_FILESIZE based on its setting from > > HTableDescriptor.getMaxFileSize? > > > > Thanks, > > -- Tom > > >
