Michael Wechner wrote:
Hi

Please apologize if I might ask something obvious, but what is actually the purpose
of the nutch-*.job file?

It contains all classes and plugins needed to run a Nutch job on a Hadoop cluster. Hadoop cluster doesn't have to be used for Nutch, indeed there are many other interesting applications for it - so the core Hadoop is independent of any Nutch classes.

So, as the job is submitted to the cluster, there must be a way to transmit all necessary implementation classes so that tasks on individual nodes could execute the Nutch code. This is the purpose of the job file - it is then expanded on each node, and all classes and plugins are loaded by a task's classloader.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to