Hi Lewis,

If there is no incompatibility, your existing job will run well on Tez
without code change. You can just follow this guide
<https://tez.apache.org/install.html> (especially step 4) to try it out.

Thanks,
Zhiyuan

On Mon, Dec 14, 2020 at 9:04 AM Lewis John McGibbney <lewi...@apache.org>
wrote:

> Hi László,
> Thanks for your response
>
> On 2020/12/12 09:43:33, László Bodor <bodorlaszlo0...@gmail.com> wrote:
> > Hi Lewis!
> >
> > Just for curiosity's sake, could you please point me to a place in nutch
> > code where some of the steps of the workflow are compiled into / done by
> > MapReduce?
>
> Please see my response to Zhiyuan earlier in this thread. I have broken
> down the Injector job and tried to describe the MapReduce logic without
> going into too many specifics. If would be greatly appreciated if you were
> able to take a look at that. Also, do you have any general guidance on how
> one would go about porting a MapReduce job to the Tez programming model?
> It's not clear to me how one identifies candidate Vertices and Edges. Thank
> you
>
> > Also - again for curiosity's sake - what about the adoption level of
> Apache
> > Nutch, could please send references about Nutch adopters? This looks like
> > an interesting project.
>
> Nutch is probably the most popular open source crawler. I understand that
> Doug Cutting and others began writing it and realized that in order to
> scale the Web crawler they needed a distributed computing model. The Hadoop
> project was born out of Nutch so that gives you an idea of how long it's
> been around for. I've been on the project for many years and have
> interacted with literally thousands of people on the mailing lists. I
> suspect that it is in deployment in a lot of places. I will also say that
> it is not a particularly easy code base to understand... it is quite
> complex. Even though Nutch has sensible default configuration,
> unfortunately it is notoriously difficult to configure as it has, similar
> to Hadoop, literally hundreds of configuration parameters which may need to
> be tuned.
>
> Thank you for assisting me with better understanding the process of
> evolving MapReduce jobs --> Tez.
> lewismc
>

Reply via email to