Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-31 Thread Eddie
Maybe there are differences in default values on different clusters running different hadoop versions for USE_STATS_FOR_PARALLELIZATION. With hadoop 3.1.x and phoenix 5.0 useStatsForParallelization is true by default and the number of splits = guidepost count +

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-31 Thread Eddie
I have done something similar in another case with subclassing DBInputFormat but I have no idea how this could be done with PhoenixInputFormat without losing data locality (which is guaranteed as long as one split represents one region). Could this be achieved somehow? (the keys are salted) Am 31

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
Please do not take this advice lightly. Adding (or increasing) salt buckets can have a serious impact on the execution of your queries. On 1/30/19 5:33 PM, venkata subbarayudu wrote: You may recreate the table with salt_bucket table option to have reasonable regions and you may try having a sec

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Ankit Singhal
As Thomas said, no. of splits will be equal to the number of guideposts available for the table or the ones required to cover the filter. if you are seeing one split per region then either stats are disabled or guidePostwidth is set higher than the size of the region , so try reducing the guidepost

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread venkata subbarayudu
You may recreate the table with salt_bucket table option to have reasonable regions and you may try having a secondary index to make the query run faster incase if your Mapreduce job performing specific filters On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva If stats are enabled PhoenixInputFormat w

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Thomas D'Silva
If stats are enabled PhoenixInputFormat will generate a split per guidepost. On Wed, Jan 30, 2019 at 7:31 AM Josh Elser wrote: > You can extend/customize the PhoenixInputFormat with your own code to > increase the number of InputSplits and Mappers. > > On 1/30/19 6:43 AM, Edwin Litterst wrote: >

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Josh Elser
You can extend/customize the PhoenixInputFormat with your own code to increase the number of InputSplits and Mappers. On 1/30/19 6:43 AM, Edwin Litterst wrote: Hi, I am using PhoenixInputFormat as input source for mapreduce jobs. The split count (which determines how many mappers are used for t

split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Edwin Litterst
Hi,   I am using PhoenixInputFormat as input source for mapreduce jobs. The split count (which determines how many mappers are used for the job) is always equal to the number of regions of the table from where I select the input. Is there a way to increase the number of splits? My job is runnin