Dear Community, I'm running Inject job programatically, from within IntelliJ, where the target cluster's (YARN) configuration and Nutch configuration is in the classpath. In addition to this, HADOOP and NUTCH CONF and HOME directories are set - to distributions that I have on my local machine.
Starting the program, the Nutch Inject connects to YARN 2.8.0 and the inject job starts correctly. However, during the initialization (setup) phase of the mapper (InjectMapper), an exception is thrown: Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined at org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78) at org.apache.nutch.plugin.PluginRepository.(PluginRepository.java:71) at org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99) at org.apache.nutch.net.URLNormalizers.(URLNormalizers.java:117) at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70) On the YARN NodeManagers, a Nutch distribution is sitting with a configuration (nutch-site.xml) that has a key "plugin.folders" that points to the plugin folders by an absolute path. As for YARN, I've set up additional environment variables for NMs, as follows: <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX,NUTCH_CONF_DIR=/opt/apache-nutch-1.13/conf/,NUTCH_HOME=/opt/apache-nutch-1.13/</value> </property> In addition to this, I have set MR environment variables as well: <property> <name>mapred.child.env</name> <value>NUTCH_HOME=/opt/apache-nutch-1.13,NUTCH_CONF_DIR=/opt/apache-nutch-1.13/conf</value> </property> I've tried to run the program with JVM parameters, supplied with -D to define "plugin.folders". Probably I'm missing something. How should I define "plugin.folders", when the inject job is submitted and run remotely. Thanks for helping me out. Zoltán

