Hi lewismc!

This is very cool, thanks!
Please let us know if nutch jira project has an umbrella about tez
integration tasks. I think further adaptation steps will be needed for full
integration (like counters as you mentioned).

Regarding initial performance improvements: I guess for shorter tasks you
can already find a perf improvement because of default
*tez.am.container.reuse.enabled=true*. This especially applies for shorter
runtimes, where e.g. JVM startup time/warmup really counts + your runtimes
look like a cold -> warm pattern to me in case of tez, I hope it's accurate.

1 MapReduce 11523

00:00:34

2 MapReduce 11523

00:00:32

3 MapReduce 11523

00:00:34

4 Tez 11523

00:00:42

5 Tez 11523

00:00:13

6 Tez 11523

00:00:14


Regards,
Laszlo Bodor


On Tue, 22 Dec 2020 at 05:23, Lewis John McGibbney <lewi...@apache.org>
wrote:

> Hi user@,
> Thanks to the assistance of several Tez Committers I've been able to pull
> together the following documentation covering my experiences running Apache
> Nutch on Apache Tez
> https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez
> Thank you to everyone that kindly assisted with my learning so far. More
> to come...
> My next experiments will involve substituting the MapReduce counters with
> the Tez ones... I'll probably create a tutorial for this which we can
> generalize and put on the Tez website. Hopefully this will act as a go-to
> resource for application developers looking to evolve their MapReduce
> applications --> Tez.
> lewismc
>

Reply via email to