Hi Jonathan, Thank you for the response. This is very useful.
Using your configuration I am able to execute the Tez examples no problem. The issue is when i attempt to run Nutch. No matter what I've tried, the dependencies for Nutch are never found. I've tried building a binary .tar.gz distribution of Nutch and referencing it's URI on HDFS... this does not work and I get ClassNotFound exceptions. I've tried referencing the Nutch .job artifact which contains all dependencies... this does not work. Just to confirm, I can successfully execute all Nutch jobs when 'mapreduce.framework.name' value is set to 'yarn'. We execute the jobs as follows hadoop jar ${NUTCH.job} $CLASS $arguments I feel like I am very close to getting this running. I wonder if someone on this list could make an attempt at running a job and seeing if they can reproduce? I've uploaded the compiled .job and the nutch bash script at https://drive.google.com/drive/folders/1yjGi8UWVZithcYWLgUINm9v6IU2Scmy5?usp=sharing You can execute the Injector tool by running ./nutch inject crawldb urls //assuming that urls is a directory on HDFS containing a simple text file with one URL entry i.e. http://tez.apache.org Again, thank you to you all for any further direction. I am really keen to get Nutch running on Tez. lewismc On 2020/12/17 18:09:02, Jonathan Eagles <jeag...@gmail.com> wrote: > This is what I use in production that has many benefits. In this case > mapreduce.application.framework.path is the runtime classpath tar.gz file > that is custom built mapreduce runtime environment, perhaps similar to nutch > 1) localizing one tar.gz file instead of many individual jars > 2) minimal jar has fewer class conflicts and a smaller footprint > 3) localizing tez to tez folder (#tez) allows better control of the > classpath to avoid java inconsistent classpath resolution of jars in same > directory > 4) use cluster hadooplibs false avoids using the jars from the individuals > nodemanagers and only relies on jars listed in tez.lib.uris > > <property> > <name>mapreduce.application.framework.path</name> > > <value>/hdfs/path/hadoop-mapreduce-${mapreduce.application.framework.version}.tgz#hadoop-mapreduce</value> > </property> > > <property> > <name>tez.lib.uris</name> > > <value>/hdfs/path/tez-0.9.2-minimal.tar.gz#tez,${mapreduce.application.framework.path}</value> > </property> > <property> > <name>tez.lib.uris.classpath</name> > <value>${mapreduce.application.classpath},./tez/*,./tez/lib/*</value> > </property> > <property> > <name>tez.use.cluster.hadoop-libs</name> > <value>false</value> > </property> > > On Thu, Dec 17, 2020 at 11:57 AM Lewis John McGibbney <lewi...@apache.org> > wrote: > > > I tried the following configuration in tez-site.xml with no luck > > > > <configuration> > > <property> > > <name>tez.lib.uris</name> > > > > <value>${fs.defaultFS}/apps/tez-0.10.1-SNAPSHOT,${fs.defaultFS}/apps/tez-0.10.1-SNAPSHOT/lib,${fs.defaultFS}/apps/nutch/apache-nutch-1.18-SNAPSHOT.job</value> > > </property> > > > > <property> > > <name>tez.lib.uris.classpath</name> > > <value>${fs.defaultFS}/apps/nutch/apache-nutch-1.18-SNAPSHOT.job</value> > > </property> > > </configuration> > > > > On 2020/12/17 17:35:28, Lewis John McGibbney <lewi...@apache.org> wrote: > > > Hi Zhiyuan, > > > Thanks for the guidance. I'm making progress but I am still battling > > initial configuration management issues. > > > I'm running HDFS and YARN v3.1.4 in pseudo-mode. > > > My tez-site.xml contains the following content > > > > > > <configuration> > > > <property> > > > <name>tez.lib.uris</name> > > > > > > > <value>${fs.defaultFS}/apps/tez-0.10.1-SNAPSHOT,${fs.defaultFS}/apps/tez-0.10.1-SNAPSHOT/lib,${fs.defaultFS}/apps/nutch</value> > > > </property> > > > </configuration> > > > > > > N.B. When I attempted to use the compressed Tez tar.gz, I was running > > into classpath issues which are largely documented in the installation > > documentation you pointed me to. I overcame these issues by simply > > uploading the minimal directory. All seems fine at this stage as I can run > > all of the Tez examples. > > > > > > I run into trouble when I try to run any job from the Nutch application. > > For example when I run the Injector one of the Nutch plugin extension > > points (x point org.apache.nutch.net.URLNormalizer) cannot be not found. > > The relevant log can be seen at https://paste.apache.org/4whoe. > > > I should note that the entire Nutch .job is available on HDFS at the URI > > defined in the tez-site.xml above. > > > > > > The output of jar -tf on the nutch.job artifact can be seen at > > https://paste.apache.org/hl8tk. > > > Am I required to somehow describe the structural heirarchy of this > > artifact in the tez.lib.uris.classpath configuration property? > > > > > > Thank you again for any guidance. > > > > > > lewismc > > > > > > On 2020/12/14 03:23:48, Zhiyuan Yang <zhiyu...@apache.org> wrote: > > > > Hi Lewis, > > > > > > > > If there is no incompatibility, your existing job will run well on Tez > > > > without code change. You can just follow this guide > > > > <https://tez.apache.org/install.html> (especially step 4) to try it > > out. > > > > > > > > Thanks, > > > > Zhiyuan > > > > > > > > On Mon, Dec 14, 2020 at 9:04 AM Lewis John McGibbney < > > lewi...@apache.org> > > > > wrote: > > > > > > > > > > > > >