Hey guys,

have a question about data locality in Tez.
Same type of input and computation logic. 
Map reduce data locality: 95 %
Tez data locality: 50 %

Having a custom InputInitializer where i’ doing like this:

        InputSplit[] splits = inputFormat.getSplits(conf, desiredSplits);

        List<Event> events = Lists.newArrayList();
        List<TaskLocationHint> locationHints = Lists.newArrayList();
        for (InputSplit split : splits) {
            
locationHints.add(TaskLocationHint.createTaskLocationHint(split.getLocations(), 
null));
        }
        VertexLocationHint locationHint = 
VertexLocationHint.create(locationHints);

        InputConfigureVertexTasksEvent configureVertexEvent = 
InputConfigureVertexTasksEvent.create(splits.size(), locationHint, 
InputSpecUpdate.getDefaultSinglePhysicalInputSpecUpdate());
        events.add(configureVertexEvent);
        for (TezSplit split : splits) {
          
events.add(InputDataInformationEvent.createWithSerializedPayload(events.size() 
- 1, ByteBuffer.wrap(split.toByteArray()))); 
       }

Any obvious flaw here ?
Or an explanation why data locality is worse ?

best
Johannes

Reply via email to