Hi lewismc! This is very cool, thanks! Please let us know if nutch jira project has an umbrella about tez integration tasks. I think further adaptation steps will be needed for full integration (like counters as you mentioned).
Regarding initial performance improvements: I guess for shorter tasks you can already find a perf improvement because of default *tez.am.container.reuse.enabled=true*. This especially applies for shorter runtimes, where e.g. JVM startup time/warmup really counts + your runtimes look like a cold -> warm pattern to me in case of tez, I hope it's accurate. 1 MapReduce 11523 00:00:34 2 MapReduce 11523 00:00:32 3 MapReduce 11523 00:00:34 4 Tez 11523 00:00:42 5 Tez 11523 00:00:13 6 Tez 11523 00:00:14 Regards, Laszlo Bodor On Tue, 22 Dec 2020 at 05:23, Lewis John McGibbney <lewi...@apache.org> wrote: > Hi user@, > Thanks to the assistance of several Tez Committers I've been able to pull > together the following documentation covering my experiences running Apache > Nutch on Apache Tez > https://cwiki.apache.org/confluence/display/NUTCH/Running+Nutch+on+Tez > Thank you to everyone that kindly assisted with my learning so far. More > to come... > My next experiments will involve substituting the MapReduce counters with > the Tez ones... I'll probably create a tutorial for this which we can > generalize and put on the Tez website. Hopefully this will act as a go-to > resource for application developers looking to evolve their MapReduce > applications --> Tez. > lewismc >