Thanks Gopal:
I have checked the tez 0.6.0 code
From the code, there is a locationHint there(I am not sure when the location
hint is set),
But it looks like task will reuse the container before going to ask new
container.
Also I have seen there is a warm up function in the client side. So I think if
I can warm up 1000 container for one tez job, then that tez should be run fast.
And my network is 10GE, so I think network io is not a bottle neck for my case,
so whether make the data aware hint is not that important.
But if I can make all tasks to be run at the same time, then it will be good.
>set tez.grouping.split-waves=1.7
the split-waves measures current queue capacity *1.7x to go wider than the
actual available capacity
[skater]But my env use default queue and root user, so *1.7 is useless to this
case >set tez.grouping.min-size=16777216
[skater] What does this parameter mean? Do we have a wiki to trace the options?
THanks in advance,
Following is the code I found for make new container:
if (locationHint != null) {
TaskBasedLocationAffinity taskAffinity =
locationHint.getAffinitizedTask();
if (taskAffinity != null) {
Vertex vertex =
appContext.getCurrentDAG().getVertex(taskAffinity.getVertexName());
Preconditions.checkNotNull(vertex, "Invalid vertex in task based
affinity " + taskAffinity
+ " for attempt: " + taskAttempt.getID());
int taskIndex = taskAffinity.getTaskIndex();
Preconditions.checkState(taskIndex >=0 && taskIndex <
vertex.getTotalTasks(),
"Invalid taskIndex in task based affinity " + taskAffinity
+ " for attempt: " + taskAttempt.getID());
TaskAttempt affinityAttempt =
vertex.getTask(taskIndex).getSuccessfulAttempt();
if (affinityAttempt != null) {
Preconditions.checkNotNull(affinityAttempt.getAssignedContainerID(),
affinityAttempt.getID());
taskScheduler.allocateTask(taskAttempt,
event.getCapability(),
affinityAttempt.getAssignedContainerID(),
Priority.newInstance(event.getPriority()),
event.getContainerContext(),
event);
return;
}
LOG.info("Attempt: " + taskAttempt.getID() + " has task based affinity
to " + taskAffinity
+ " but no locality information exists for it. Ignoring hint.");
// fall through with null hosts/racks
} else {
hosts = (locationHint.getHosts() != null) ? locationHint
.getHosts().toArray(
new String[locationHint.getHosts().size()]) : null;
racks = (locationHint.getRacks() != null) ? locationHint.getRacks()
.toArray(new String[locationHint.getRacks().size()]) : null;
}
}
taskScheduler.allocateTask(taskAttempt,
event.getCapability(),
hosts,
racks,
Priority.newInstance(event.getPriority()),
event.getContainerContext(),
event);
At 2015-04-11 12:58:06, "Gopal Vijayaraghavan" <[email protected]> wrote:
>
>> I have a hive full scan job , with hive on mr I can fully use
>>the whole cluster's 1000 cpu vcores(I use the split size to make mapper
>>tasks to be 1200),
>> But in tez, tez only use around 700 vcores, I have also set the same
>>hive split size. So how do I configure tez? to make tez fully use all the
>>cluster resources?
>
>If you¹re on hive-1.0/later, the option to go wide is called
>tez.grouping.split-waves.
>
>With ORC, the regular MRv2 splits generates empty tasks (so that not all
>map-tasks process valid ranges).
>
>But to get it as wide as possible
>
>set mapred.max.split.size=33554432
>set tez.grouping.split-waves=1.7
>set tez.grouping.min-size=16777216
>
>should do the trick, the split-waves measures current queue capacity *
>1.7x to go wider than the actual available capacity.
>
>In previous versions (0.13/0.14), ³set² commands don¹t work, so the
>options are prefixed by the tez.am.* - you have to do
>
>hive -hiveconf tez.am.grouping.split-waves=1.7 -hiveconf
>tez.grouping.min-size=16777216 -hiveconf mapred.max.split.size=33554432
>
>
>We hope to throw away these hacks in hive-1.2 & for this Prasanth checked
>in a couple of different split strategies for ORC in hive-1.2.0
>(ETL/BI/HYBRID) etc.
>
>I will probably send out my slides about ORC (incl. new split gen) after
>Hadoop Summit Europe, if you want more details.
>
>Ideally, any tests with the latest code would help me fix anything that¹s
>specific to your use-cases.
>
>
>Cheers,
>Gopal
>
>
>
>
>
>
>