Re: HFileOutputFormat2.configureIncrementalLoad and HREGION_MAX_FILESIZE

Nick Dimiduk Fri, 23 Jan 2015 10:41:40 -0800

Have a look at the code in HFileOutputFormat2#configureIncrementalLoad(Job,
HTableDescriptor, RegionLocator). The HTableDescriptor is only used for
details of writing blocks: compression, bloom filters, block size, and
block encoding. Other table properties are left to the online table.

I'm not sure what you're trying to accomplish by setting this value in your
job. The size of the HFiles produced will be dependent on the data
distribution characteristics of your MR job, not the online table. When you
completeBulkLoad, those generated HFiles will be split according to the
online table region boundaries, and loaded into the Regions. Once that
happens, usual region lifecycle stuff picks up, meaning at that point, the
online region will decide if/when to split based on store sizes.

Hope that helps,
-n

On Fri, Jan 23, 2015 at 10:29 AM, Ted Yu <[email protected]> wrote:

> Suppose the value used by bulk loading is different from that used by
> region server, how would region server deal with two (or more) values
> w.r.t. splitting ?
>
> Cheers
>
> On Fri, Jan 23, 2015 at 10:15 AM, Tom Hood <[email protected]> wrote:
>
> > Hi,
> >
> > I'm bulkloading into an empty hbase table and have called
> > HTableDescriptor.setMaxFileSize to override the global setting of
> > HConstants.HREGION_MAX_FILESIZE (i.e. hbase.hregion.max.filesize).
> >
> > This newly configured table is then passed to
> > HFileOutputFormat2.configureIncrementalLoad to setup the MR job to
> generate
> > the hfiles.  This already configures other properties based on the
> settings
> > of the table it's given (e.g. compression, bloom, data encoding, splits,
> > etc).  Is there a reason it does not also configure the
> > HREGION_MAX_FILESIZE based on its setting from
> > HTableDescriptor.getMaxFileSize?
> >
> > Thanks,
> > -- Tom
> >
>

Re: HFileOutputFormat2.configureIncrementalLoad and HREGION_MAX_FILESIZE

Reply via email to