Re: split count for mapreduce jobs with PhoenixInputFormat

Eddie Thu, 31 Jan 2019 08:41:11 -0800

Maybe there are differences in default values on different clusters running different hadoop versions for USE_STATS_FOR_PARALLELIZATION.
With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is true by default and the number of splits = guidepost count + number of regions.
I changed GUIDE_POST_WIDTH to another value:
ALTER TABLE <tablename> SET GUIDE_POSTS_WIDTH = 10240000
UPDATE STATISTICS <tablename> ALL
Unfortunately this didn't change the guidepost count and also not the split count. Am I missing something here?

Am 30.01.2019 um 19:38 schrieb Thomas D'Silva:

If stats are enabled PhoenixInputFormat will generate a split per guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser <els...@apache.org> wrote:

You can extend/customize the PhoenixInputFormat with your own code to
increase the number of InputSplits and Mappers.

On 1/30/19 6:43 AM, Edwin Litterst wrote:
> Hi,
> I am using PhoenixInputFormat as input source for mapreduce jobs.
> The split count (which determines how many mappers are used for the job)
> is always equal to the number of regions of the table from where I
> select the input.
> Is there a way to increase the number of splits? My job is running too
> slow with only one mapper for every region.
> (Increasing the number of regions is no option.)
> regards,
> Eddie

Re: split count for mapreduce jobs with PhoenixInputFormat

Reply via email to