Hi Zoltán, a warning ahead: personally, I've never tried to control Nutch launch remotely, so I know no solution.
If the property "plugin.folders" is not known this means Nutch also didn't read nutch-default.xml where it is defined. I would start to look at the classpath whether it contains the configuration folder (local mode) or the apache-nutch-*.job file (distributed mode). Note that the environment variable NUTCH_CONF_DIR is used only by bin/nutch - the path is added to the classpath. Loading of configuration files (nutch-site.xml and nutch-default.xml) is delegated to Hadoop. Similarly, NUTCH_HOME is only used to find the Nutch installation or the job file. To analyze the problem, try to set log4j.logger.org.apache.hadoop=WARN to INFO or DEBUG. Best, Sebastian On 07/18/2017 08:50 PM, Zoltán Zvara wrote: > Dear Community, > > I'm running Inject job programatically, from within IntelliJ, where the > target cluster's (YARN) configuration and Nutch configuration is in the > classpath. In addition to this, HADOOP and NUTCH CONF and HOME directories > are set - to distributions that I have on my local machine. > > Starting the program, the Nutch Inject connects to YARN 2.8.0 and the inject > job starts correctly. However, during the initialization (setup) phase of the > mapper (InjectMapper), an exception is thrown: > > Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined > at > org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) > at org.apache.nutch.plugin.PluginRepository.(PluginRepository.java:71) > at org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99) > at org.apache.nutch.net.URLNormalizers.(URLNormalizers.java:117) > at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) > > On the YARN NodeManagers, a Nutch distribution is sitting with a > configuration (nutch-site.xml) that has a key "plugin.folders" that points to > the plugin folders by an absolute path. As for YARN, I've set up additional > environment variables for NMs, as follows: > > <property> > <name>yarn.nodemanager.admin-env</name> > <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,NUTCH_CONF_DIR=/opt/apache-nutch-1.13/conf/,NUTCH_HOME=/opt/apache-nutch-1.13/</value> > </property> > > In addition to this, I have set MR environment variables as well: > > <property> > <name>mapred.child.env</name> > <value>NUTCH_HOME=/opt/apache-nutch-1.13,NUTCH_CONF_DIR=/opt/apache-nutch-1.13/conf</value> > </property> > > I've tried to run the program with JVM parameters, supplied with -D to define > "plugin.folders". > > Probably I'm missing something. How should I define "plugin.folders", when > the inject job is submitted and run remotely. > > Thanks for helping me out. > > Zoltán >