Amit,
The local fetch optimization is enabled by default in Tez-0.7. It reduces
the number of connections by a bit and ends up reading files generated on
the same box directly.

Another optimization which is far more useful is the shared fetch
optimization. This tries to avoid copying the same data onto the same host
multiple times. We've seen fairly good gains when fetching data to 10K
reducers from a single source - 28 minutes down to 2 minutes. There's an
example - BroadcastLoadGen - which can be used to try out this feature.

For the local fetch optimization - you could use the same job (may need
some modification), to control the amount of data generated and fetched by
each node. i.e. measure advantages with a 1MB fetch/local vs 200MB fetch /
local read.

HTH
- Sid

On Fri, May 15, 2015 at 11:19 AM, Amit Tiwari <[email protected]> wrote:

> Hey guys,
> Local fetch optimization seems like an awesome feature. I'd like to add
> some tests for our CI/CD pipeline that exercise this feature.
> Any thoughts on what kind of setup, data etc I may need for this?
> thanks
> --amit
>
>

Reply via email to