Fabio,
One of the simplest ways to achieve this is to disable split grouping
completely. You may end up with a large number of tasks in this case
though. This gets rid of the dynamic split generation based on cluster
node. (You'll have to check with Hive on how to disable this).
Other than this, setting min/max-size to the same value should produce the
desired results; there can be some variances in the groups generated though
- based on the order in which HDFS gives back it's block locations.


On Thu, Feb 19, 2015 at 1:47 AM, Fabio C. <anyte...@gmail.com> wrote:

> Hi everyone,
> I see that Hive on Tez dynamically chooses the number of tasks to launch
> for each vertex in the generated DAG according to cluster load (other than
> data size).
> For research purposes I'd like to avoid this feature since I need every
> query (running on the same datasets) to be executed with the same number of
> tasks, regardless of the state of the cluster (if I run query X, n tasks
> have to be allocated in any case).
> At this point I can't make tests with heavy workloads, so I want to ask
> you if you think setting tez.am.grouping.min-size and
> tez.am.grouping.max-size to the same value can do the trick, or if you have
> any better suggestion to achieve this behavior.
> Other than this feature, is there anything else that could change the
> number of splits across different runs of the same query?
>
> Thanks a lot
>
> Fabio
>
>

Reply via email to