Hi Lewis, If there is no incompatibility, your existing job will run well on Tez without code change. You can just follow this guide <https://tez.apache.org/install.html> (especially step 4) to try it out.
Thanks, Zhiyuan On Mon, Dec 14, 2020 at 9:04 AM Lewis John McGibbney <lewi...@apache.org> wrote: > Hi László, > Thanks for your response > > On 2020/12/12 09:43:33, László Bodor <bodorlaszlo0...@gmail.com> wrote: > > Hi Lewis! > > > > Just for curiosity's sake, could you please point me to a place in nutch > > code where some of the steps of the workflow are compiled into / done by > > MapReduce? > > Please see my response to Zhiyuan earlier in this thread. I have broken > down the Injector job and tried to describe the MapReduce logic without > going into too many specifics. If would be greatly appreciated if you were > able to take a look at that. Also, do you have any general guidance on how > one would go about porting a MapReduce job to the Tez programming model? > It's not clear to me how one identifies candidate Vertices and Edges. Thank > you > > > Also - again for curiosity's sake - what about the adoption level of > Apache > > Nutch, could please send references about Nutch adopters? This looks like > > an interesting project. > > Nutch is probably the most popular open source crawler. I understand that > Doug Cutting and others began writing it and realized that in order to > scale the Web crawler they needed a distributed computing model. The Hadoop > project was born out of Nutch so that gives you an idea of how long it's > been around for. I've been on the project for many years and have > interacted with literally thousands of people on the mailing lists. I > suspect that it is in deployment in a lot of places. I will also say that > it is not a particularly easy code base to understand... it is quite > complex. Even though Nutch has sensible default configuration, > unfortunately it is notoriously difficult to configure as it has, similar > to Hadoop, literally hundreds of configuration parameters which may need to > be tuned. > > Thank you for assisting me with better understanding the process of > evolving MapReduce jobs --> Tez. > lewismc >