Hey guys,
have a question about data locality in Tez.
Same type of input and computation logic.
Map reduce data locality: 95 %
Tez data locality: 50 %
Having a custom InputInitializer where i’ doing like this:
InputSplit[] splits = inputFormat.getSplits(conf, desiredSplits);
List<Event> events = Lists.newArrayList();
List<TaskLocationHint> locationHints = Lists.newArrayList();
for (InputSplit split : splits) {
locationHints.add(TaskLocationHint.createTaskLocationHint(split.getLocations(),
null));
}
VertexLocationHint locationHint =
VertexLocationHint.create(locationHints);
InputConfigureVertexTasksEvent configureVertexEvent =
InputConfigureVertexTasksEvent.create(splits.size(), locationHint,
InputSpecUpdate.getDefaultSinglePhysicalInputSpecUpdate());
events.add(configureVertexEvent);
for (TezSplit split : splits) {
events.add(InputDataInformationEvent.createWithSerializedPayload(events.size()
- 1, ByteBuffer.wrap(split.toByteArray())));
}
Any obvious flaw here ?
Or an explanation why data locality is worse ?
best
Johannes