Hi Clark, This is a lot of information... thank you for compiling it all. Ideally the version of Hadoop being used with Nutch should ALWAYS match the hadoop binaries referenced in https://github.com/apache/nutch/blob/master/ivy/ivy.xml. This way you wont run into the classpath issues. I would like to encourage you to create a wiki page so we can document this in a user firnedly way... would you be open to that? You can create an account at https://cwiki.apache.org/confluence/display/NUTCH/Home Thanks for your consideration. lewismc
On 2021/07/14 18:27:23, Clark Benham <cl...@thehive.ai> wrote: > Hi All, > > Sebastian Helped fix my issue: using S3 as a backend I was able to get > nutch-1.19 working with pre-built hadoop-3.3.0 and java 11. There was an > oddity that nutch-1.19 had 11 hadoop 3.1.3 jars, eg. > hadoop-hdfs-3.1.3.jar, hadoop-yarn-api-3.1.3.jar... ; this made running > `hadoop version` give 3.1.3) so I replaced those 3.1.3 jars with the 3.3.0 > jars from the hadoop download. > Also, in the main nutch branch ( > https://github.com/apache/nutch/blob/master/ivy/ivy.xml) ivy.xml currently > has dependencies on hadoop-3.1.3; eg. > <!-- Hadoop Dependencies --> > <dependency org="org.apache.hadoop" name="hadoop-common" rev="3.1.3" > conf="*->default"> > <exclude org="hsqldb" name="hsqldb" /> > <exclude org="net.sf.kosmosfs" name="kfs" />z > <exclude org="net.java.dev.jets3t" name="jets3t" /> > <exclude org="org.eclipse.jdt" name="core" /> > <exclude org="org.mortbay.jetty" name="jsp-*" /> > <exclude org="ant" name="ant" /> > </dependency> > <dependency org="org.apache.hadoop" name="hadoop-hdfs" rev="3.1.3" > conf="*->default" /> > <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-core" > rev="3.1.3" conf="*->default" /> > <dependency org="org.apache.hadoop" > name="hadoop-mapreduce-client-jobclient" rev="3.1.3" conf="*->default" /> > <!-- End of Hadoop Dependencies --> > > I set yarn.nodemanager.local-dirs to '${hadoop.tmp.dir}/nm-local-dir'. > > I didn't change "mapreduce.job.dir" because there's no namenode nor > datanode processes running when using hadoop with S3, so the UI is blank. > > Copied from Email with Sebastian: > > > The plugin loader doesn't appear to be able to read from s3 in > nutch-1.18 > > > with hadoop-3.2.1[1]. > > > I had a look into the plugin loader: it can only read from the local file > system. > > But that's ok because the Nutch job file is copied to the local machine > > and unpacked. Here the paths how it looks like on one of the running > Common Crawl > > task nodes: > > The configs for the working hadoop are as follows: > > core-site.xml > > <?xml version="1.0" encoding="UTF-8"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- > > Licensed under the Apache License, Version 2.0 (the "License"); > > you may not use this file except in compliance with the License. > > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. See accompanying LICENSE file. > > --> > > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > > <name>hadoop.tmp.dir</name> > > <value>/home/hdoop/tmpdata</value> > > </property> > > <property> > > <name>fs.defaultFS</name> > > <value>s3a://my-bucket</value> > > </property> > > > <property> > > <name>fs.s3a.access.key</name> > > <value>KEY_PLACEHOLDER</value> > > <description>AWS access key ID. > > Omit for IAM role-based or provider-based authentication.</description> > > </property> > > > <property> > > <name>fs.s3a.secret.key</name> > > <value>SECRET_PLACEHOLDER</value> > > <description>AWS secret key. > > Omit for IAM role-based or provider-based authentication.</description> > > </property> > > > <property> > > <name>fs.s3a.aws.credentials.provider</name> > > <description> > > Comma-separated class names of credential provider classes which > implement > > com.amazonaws.auth.AWSCredentialsProvider. > > > These are loaded and queried in sequence for a valid set of credentials. > > Each listed class must implement one of the following means of > > construction, which are attempted in order: > > 1. a public constructor accepting java.net.URI and > > org.apache.hadoop.conf.Configuration, > > 2. a public static method named getInstance that accepts no > > arguments and returns an instance of > > com.amazonaws.auth.AWSCredentialsProvider, or > > 3. a public default constructor. > > > Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider > allows > > anonymous access to a publicly accessible S3 bucket without any > credentials. > > Please note that allowing anonymous access to an S3 bucket compromises > > security and therefore is unsuitable for most use cases. It can be > useful > > for accessing public data sets without requiring AWS credentials. > > > If unspecified, then the default list of credential provider classes, > > queried in sequence, is: > > 1. org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider: > > Uses the values of fs.s3a.access.key and fs.s3a.secret.key. > > 2. com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports > > configuration of AWS access key ID and secret access key in > > environment variables named AWS_ACCESS_KEY_ID and > > AWS_SECRET_ACCESS_KEY, as documented in the AWS SDK. > > 3. com.amazonaws.auth.InstanceProfileCredentialsProvider: supports use > > of instance profile credentials if running in an EC2 VM. > > </description> > > </property> > > > > <dependencies> > > <dependency> > > <groupId>org.apache.hadoop</groupId> > > <artifactId>hadoop-client</artifactId> > > <version>${hadoop.version}</version> > > </dependency> > > <dependency> > > <groupId>org.apache.hadoop</groupId> > > <artifactId>hadoop-aws</artifactId> > > <version>${hadoop.version}</version> > > </dependency> > > </dependencies> > > > </configuration> > > > > hadoop-env.sh > > # > > # Licensed to the Apache Software Foundation (ASF) under one > > # omore contributor license agreements. See the NOTICE file > > # distributed with this work for additional information > > # regarding copyright ownership. The ASF licenses this file > > # to you under the Apache License, Version 2.0 (the > > # "License"); you may not use this file except in compliance > > #a with the License. You may obtain a copy of the License at > > # > > # http://www.apache.org/licenses/LICENSE-2.0 > > # > > # Unless required by applicable law or agreed to in writing, software > > # distributed under the License is distributed on an "AS IS" BASIS, > > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > # See the License for the specific language governing permissions and > > # limitations under the License. > > > # Set Hadoop-specific environment variables here. > > > ## > > ## THIS FILE ACTS AS THE MASTER FILE FOR ALL HADOOP PROJECTS. > > ## SETTINGS HERE WILL BE READ BY ALL HADOOP COMMANDS. THEREFORE, > > ## ONE CAN USE THIS FILE TO SET YARN, HDFS, AND MAPREDUCE > > ## CONFIGURATION OPTIONS INSTEAD OF xxx-env.sh. > > ## > > ## Precedence rules: > > ## > > ## {yarn-env.sh|hdfs-env.sh} > hadoop-env.sh > hard-coded defaults > > ## > > ## {YARN_xyz|HDFS_xyz} > HADOOP_xyz > hard-coded defaults > > ## > > > # Many of the options here are built from the perspective that users > > # may want to provide OVERWRITING values on the command line. > > # For example: > > # > > # JAVA_HOME=/usr/java/testing hdfs dfs -ls > > # > > # Therefore, the vast majority (BUT NOT ALL!) of these defaults > > # are configured for substitution and not append. If append > > # is preferable, modify this file accordingly. > > > ### > > # Generic settings for HADOOP > > ### > > > # Technically, the only required environment variable is JAVA_HOME. > > # All others are optional. However, the defaults are probably not > > # preferred. Many sites configure these options outside of Hadoop, > > # such as in /etc/profile.d > > > # The java implementation to use. By default, this environment > > # variable is REQUIRED on ALL platforms except OS X! > > export HADOOP_HOME=~/hadoop-3.3.0 > > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 > > export > EXTRA_PATH=/home/hdoop/hadoop-3.3.0/share/hadoop/tools/lib/hadoop-aws-3.3.0.jar:/home/hdoop/hadoop-3.3.0/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.563.jar:/home/hdoop/nutch:/home/hdoop/nutch/lib/commons-jexl3-3.1.jar:/home/hdoop/nutch/build/plugins:/home/hdoop/nutch/build/lib/* > > export PATH=$JAVA_HOME/bin:$EXTRA_PATH:$PATH > > > # Location of Hadoop. By default, Hadoop will attempt to determine > > # this location based upon its execution path. > > # export HADOOP_HOME= > > > # Location of Hadoop's configuration information. i.e., where this > > # file is living. If this is not defined, Hadoop will attempt to > > # locate it based upon its execution path. > > # > > # NOTE: It is recommend that this variable not be set here but in > > # /etc/profile.d or equivalent. Some options (such as > > # --config) may react strangely otherwise. > > # > > # export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop > > > # The maximum amount of heap to use (Java -Xmx). If no unit > > # is provided, it will be converted to MB. Daemons will > > # prefer any Xmx setting in their respective _OPT variable. > > # There is no default; the JVM will autoscale based upon machine > > # memory size. > > # export HADOOP_HEAPSIZE_MAX= > > > # The minimum amount of heap to use (Java -Xms). If no unit > > # is provided, it will be converted to MB. Daemons will > > # prefer any Xms setting in their respective _OPT variable. > > # There is no default; the JVM will autoscale based upon machine > > # memory size. > > # export HADOOP_HEAPSIZE_MIN= > > > # Enable extra debugging of Hadoop's JAAS binding, used to set up > > # Kerberos security. > > # export HADOOP_JAAS_DEBUG=true > > > # Extra Java runtime options for all Hadoop commands. We don't support > > # IPv6 yet/still, so by default the preference is set to IPv4. > > # export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true" > > # For Kerberos debugging, an extended option set logs more information > > # export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true > -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug" > > > # Some parts of the shell code may do special things dependent upon > > # the operating system. We have to set this here. See the next > > # section as to why.... > > export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)} > > > # Extra Java runtime options for some Hadoop commands > > # and clients (i.e., hdfs dfs -blah). These get appended to HADOOP_OPTS for > > # such commands. In most cases, # this should be left empty and > > # let users supply it on the command line. > > # export HADOOP_CLIENT_OPTS="" > > > # > > # A note about classpaths. > > # > > # By default, Apache Hadoop overrides Java's CLASSPATH > > # environment variable. It is configured such > > # that it starts out blank with new entries added after passing > > # a series of checks (file/dir exists, not already listed aka > > # de-deduplication). During de-deduplication, wildcards and/or > > # directories are *NOT* expanded to keep it simple. Therefore, > > # if the computed classpath has two specific mentions of > > # awesome-methods-1.0.jar, only the first one added will be seen. > > # If two directories are in the classpath that both contain > > # awesome-methods-1.0.jar, then Java will pick up both versions. > > > # An additional, custom CLASSPATH. Site-wide configs should be > > # handled via the shellprofile functionality, utilizing the > > # hadoop_add_classpath function for greater control and much > > # harder for apps/end-users to accidentally override. > > # Similarly, end users should utilize ${HOME}/.hadooprc . > > # This variable should ideally only be used as a short-cut, > > # interactive way for temporary additions on the command line. > > export HADOOP_CLASSPATH=$EXTRA_PATH:$JAVA_HOME/bin:$HADOOP_CLASSPATH > > > # Should HADOOP_CLASSPATH be first in the official CLASSPATH? > > # export HADOOP_USER_CLASSPATH_FIRST="yes" > > > # If HADOOP_USE_CLIENT_CLASSLOADER is set, the classpath along > > # with the main jar are handled by a separate isolated > > # client classloader when 'hadoop jar', 'yarn jar', or 'mapred job' > > # is utilized. If it is set, HADOOP_CLASSPATH and > > # HADOOP_USER_CLASSPATH_FIRST are ignored. > > # export HADOOP_USE_CLIENT_CLASSLOADER=true > > > # HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES overrides the default definition > of > > # system classes for the client classloader when > HADOOP_USE_CLIENT_CLASSLOADER > > # is enabled. Names ending in '.' (period) are treated as package names, and > > # names starting with a '-' are treated as negative matches. For example, > > # export > HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES="-org.apache.hadoop.UserClass,java.,javax.,org.apache.hadoop." > > > # Enable optional, bundled Hadoop features > > # This is a comma delimited list. It may NOT be overridden via .hadooprc > > # Entries may be added/removed as needed. > > # export > HADOOP_OPTIONAL_TOOLS="hadoop-aliyun,hadoop-openstack,hadoop-azure,hadoop-azure-datalake,hadoop-aws,hadoop-kafka" > > > ### > > # Options for remote shell connectivity > > ### > > > # There are some optional components of hadoop that allow for > > # command and control of remote hosts. For example, > > # start-dfs.sh will attempt to bring up all NNs, DNS, etc. > > > # Options to pass to SSH when one of the "log into a host and > > # start/stop daemons" scripts is executed > > # export HADOOP_SSH_OPTS="-o BatchMode=yes -o StrictHostKeyChecking=no -o > ConnectTimeout=10s" > > > # The built-in ssh handler will limit itself to 10 simultaneous connections. > > # For pdsh users, this sets the fanout size ( -f ) > > # Change this to increase/decrease as necessary. > > # export HADOOP_SSH_PARALLEL=10 > > > # Filename which contains all of the hosts for any remote execution > > # helper scripts # such as workers.sh, start-dfs.sh, etc. > > # export HADOOP_WORKERS="${HADOOP_CONF_DIR}/workers" > > > ### > > # Options for all daemons > > ### > > # > > > # > > # Many options may also be specified as Java properties. It is > > # very common, and in many cases, desirable, to hard-set these > > # in daemon _OPTS variables. Where applicable, the appropriate > > # Java property is also identified. Note that many are re-used > > # or set differently in certain contexts (e.g., secure vs > > # non-secure) > > # > > > # Where (primarily) daemon log files are stored. > > # ${HADOOP_HOME}/logs by default. > > # Java property: hadoop.log.dir > > # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs > > > # A string representing this instance of hadoop. $USER by default. > > # This is used in writing log and pid files, so keep that in mind! > > # Java property: hadoop.id.str > > # export HADOOP_IDENT_STRING=$USER > > > # How many seconds to pause after stopping a daemon > > # export HADOOP_STOP_TIMEOUT=5 > > > # Where pid files are stored. /tmp by default. > > # export HADOOP_PID_DIR=/tmp > > > # Default log4j setting for interactive commands > > # Java property: hadoop.root.logger > > # export HADOOP_ROOT_LOGGER=INFO,console > > > # Default log4j setting for daemons spawned explicitly by > > # --daemon option of hadoop, hdfs, mapred and yarn command. > > # Java property: hadoop.root.logger > > # export HADOOP_DAEMON_ROOT_LOGGER=INFO,RFA > > > # Default log level and output location for security-related messages. > > # You will almost certainly want to change this on a per-daemon basis via > > # the Java property (i.e., -Dhadoop.security.logger=foo). (Note that the > > # defaults for the NN and 2NN override this by default.) > > # Java property: hadoop.security.logger > > # export HADOOP_SECURITY_LOGGER=INFO,NullAppender > > > # Default process priority level > > # Note that sub-processes will also run at this level! > > # export HADOOP_NICENESS=0 > > > # Default name for the service level authorization file > > # Java property: hadoop.policy.file > > # export HADOOP_POLICYFILE="hadoop-policy.xml" > > > # > > # NOTE: this is not used by default! <----- > > # You can define variables right here and then re-use them later on. > > # For example, it is common to use the same garbage collection settings > > # for all the daemons. So one could define: > > # > > # export HADOOP_GC_SETTINGS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps" > > # > > # .. and then use it as per the b option under the namenode. > > > ### > > # Secure/privileged execution > > ### > > > # > > # Out of the box, Hadoop uses jsvc from Apache Commons to launch daemons > > # on privileged ports. This functionality can be replaced by providing > > # custom functions. See hadoop-functions.sh for more information. > > # > > > # The jsvc implementation to use. Jsvc is required to run secure datanodes > > # that bind to privileged ports to provide authentication of data transfer > > # protocol. Jsvc is not required if SASL is configured for authentication > of > > # data transfer protocol using non-privileged ports. > > # export JSVC_HOME=/usr/bin > > > # > > # This directory contains pids for secure and privileged processes. > > #export HADOOP_SECURE_PID_DIR=${HADOOP_PID_DIR} > > > # > > # This directory contains the logs for secure and privileged processes. > > # Java property: hadoop.log.dir > > # export HADOOP_SECURE_LOG=${HADOOP_LOG_DIR} > > > # > > # When running a secure daemon, the default value of HADOOP_IDENT_STRING > > # ends up being a bit bogus. Therefore, by default, the code will > > # replace HADOOP_IDENT_STRING with HADOOP_xx_SECURE_USER. If one wants > > # to keep HADOOP_IDENT_STRING untouched, then uncomment this line. > > # export HADOOP_SECURE_IDENT_PRESERVE="true" > > > ### > > # NameNode specific parameters > > ### > > > # Default log level and output location for file system related change > > # messages. For non-namenode daemons, the Java property must be set in > > # the appropriate _OPTS if one wants something other than INFO,NullAppender > > # Java property: hdfs.audit.logger > > # export HDFS_AUDIT_LOGGER=INFO,NullAppender > > > # Specify the JVM options to be used when starting the NameNode. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # a) Set JMX options > > # export HDFS_NAMENODE_OPTS="-Dcom.sun.management.jmxremote=true > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=1026" > > # > > # b) Set garbage collection logs > > # export HDFS_NAMENODE_OPTS="${HADOOP_GC_SETTINGS} > -Xloggc:${HADOOP_LOG_DIR}/gc-rm.log-$(date +'%Y%m%d%H%M')" > > # > > # c) ... or set them directly > > # export HDFS_NAMENODE_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps > -Xloggc:${HADOOP_LOG_DIR}/gc-rm.log-$(date +'%Y%m%d%H%M')" > > > # this is the default: > > # export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS" > > > ### > > # SecondaryNameNode specific parameters > > ### > > # Specify the JVM options to be used when starting the SecondaryNameNode. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # This is the default: > > # export HDFS_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS" > > > ### > > # DataNode specific parameters > > ### > > # Specify the JVM options to be used when starting the DataNode. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # This is the default: > > # export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS" > > > # On secure datanodes, user to run the datanode as after dropping > privileges. > > # This **MUST** be uncommented to enable secure HDFS if using privileged > ports > > # to provide authentication of data transfer protocol. This **MUST NOT** be > > # defined if SASL is configured for authentication of data transfer protocol > > # using non-privileged ports. > > # This will replace the hadoop.id.str Java property in secure mode. > > # export HDFS_DATANODE_SECURE_USER=hdfs > > > # Supplemental options for secure datanodes > > # By default, Hadoop uses jsvc which needs to know to launch a > > # server jvm. > > # export HDFS_DATANODE_SECURE_EXTRA_OPTS="-jvm server" > > > ### > > # NFS3 Gateway specific parameters > > ### > > # Specify the JVM options to be used when starting the NFS3 Gateway. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_NFS3_OPTS="" > > > # Specify the JVM options to be used when starting the Hadoop portmapper. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_PORTMAP_OPTS="-Xmx512m" > > > # Supplemental options for priviliged gateways > > # By default, Hadoop uses jsvc which needs to know to launch a > > # server jvm. > > # export HDFS_NFS3_SECURE_EXTRA_OPTS="-jvm server" > > > # On privileged gateways, user to run the gateway as after dropping > privileges > > # This will replace the hadoop.id.str Java property in secure mode. > > # export HDFS_NFS3_SECURE_USER=nfsserver > > > ### > > # ZKFailoverController specific parameters > > ### > > # Specify the JVM options to be used when starting the ZKFailoverController. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_ZKFC_OPTS="" > > > ### > > # QuorumJournalNode specific parameters > > ### > > # Specify the JVM options to be used when starting the QuorumJournalNode. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_JOURNALNODE_OPTS="" > > > ### > > # HDFS Balancer specific parameters > > ### > > # Specify the JVM options to be used when starting the HDFS Balancer. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_BALANCER_OPTS="" > > > ### > > # HDFS Mover specific parameters > > ### > > # Specify the JVM options to be used when starting the HDFS Mover. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_MOVER_OPTS="" > > > ### > > # Router-based HDFS Federation specific parameters > > # Specify the JVM options to be used when starting the RBF Routers. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_DFSROUTER_OPTS="" > > > ### > > # HDFS StorageContainerManager specific parameters > > ### > > # Specify the JVM options to be used when starting the HDFS Storage > Container Manager. > > # These options will be appended to the options specified as HADOOP_OPTS > > # and therefore may override any similar flags set in HADOOP_OPTS > > # > > # export HDFS_STORAGECONTAINERMANAGER_OPTS="" > > > ### > > # Advanced Users Only! > > ### > > > # > > # When building Hadoop, one can add the class paths to the commands > > # via this special env var: > > # export HADOOP_ENABLE_BUILD_PATHS="true" > > > # > > # To prevent accidents, shell commands be (superficially) locked > > # to only allow certain users to execute certain subcommands. > > # It uses the format of (command)_(subcommand)_USER. > > # > > # For example, to limit who can execute the namenode command, > > # export HDFS_NAMENODE_USER=hdfs > > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 > > > # Enable s3 > > export HADOOP_OPTIONAL_TOOLS="hadoop-aws" > > echo "Ensure AWS Credentials are added to hadoop-env.sh and core-site.xml, > by running add-aws-keys.sh" > > export AWS_ACCESS_KEY_ID=KEY_PLACEHOLDER > > export AWS_SECRET_ACCESS_KEY=SECRET_PLACEHOLDER > > > > hdfs-site.xml > > <?xml version="1.0" encoding="UTF-8"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- > > Licensed under the Apache License, Version 2.0 (the "License"); > > you may not use this file except in compliance with the License. > > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. See accompanying LICENSE file. > > --> > > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > > <name>dfs.data.dir</name> > > <value>/home/hdoop/dfsdata/namenode</value> > > </property> > > <property> > > <name>dfs.data.dir</name> > > <value>/home/hdoop/dfsdata/datanode</value> > > </property> > > <property> > > <name>dfs.replication</name> > > <value>1</value> > > </property> > > </configuration> > > > > mapred-site.xml > > <?xml version="1.0"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- > > Licensed under the Apache License, Version 2.0 (the "License"); > > you may not use this file except in compliance with the License. > > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. See accompanying LICENSE file. > > --> > > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > > <property> > > <name>mapreduce.framework.name</name> > > <value>yarn</value> > > </property> > > <property> > > <name>yarn.app.mapreduce.am.env</name> > > <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> > > </property> > > <property> > > <name>mapreduce.map.env</name> > > <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> > > </property> > > <property> > > <name>mapreduce.reduce.env</name> > > <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> > > </property> > > <!-- > > <property> > > <name>mapreduce.application.classpath</name> > > > > <value>/home/hdoop/hadoop-3.3.0/etc/hadoop:/home/hdoop/hadoop-3.3.0/share/hadoop/common/lib/*:/home/hdoop/hadoop-3.3.0/share/hadoop/common/*:/home/hdoop/hadoop-3.3.0/share/hadoop/hdfs:/home/hdoop/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/home/hdoop/hadoop-3.3.0/share/hadoop/hdfs/*:/home/hdoop/hadoop-3.3.0/share/hadoop/mapreduce/*:/home/hdoop/hadoop-3.3.0/share/hadoop/yarn:/home/hdoop/hadoop-3.3.0/share/hadoop/yarn/lib/*:/home/hdoop/hadoop-3.3.0/share/hadoop/yarn/*:/home/hdoop/hadoop-3.3.0/bin:/home/hdoop/hadoop-3.3.0/sbin</value> > > </property> > > --> > > <property> > > <name>mapreduce.application.classpath</name> > > > > <value>home/hdoop/hadoop-3.3.0/etc/hadoop:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/*:/home/hdoop/hadoop-3.3.0//share/hadoop/common/*:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/*:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/*:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/lib/*:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/*:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/*:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/*:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/accessors-smart-1.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/asm-5.0.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/audience-annotations-0.5.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/avro-1.7.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/checker-qual-2.5.2.jar:/home/hdoop/hadoop-3.3. 0//share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-cli-1.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-codec-1.11.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-compress-1.18.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-io-2.5.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-net-3.6.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/commons-text-1.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/curator-client-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/cu rator-framework-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/curator-recipes-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/dnsjava-2.1.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/error_prone_annotations-2.2.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/failureaccess-1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/gson-2.2.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/guava-27.0-jre.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/hadoop-annotations-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/hadoop-auth-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/httpclient-4.5.6.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/httpcore-4.4.10.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/j2objc-annotations-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-annotations-2.9.8.jar:/h ome/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-core-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-databind-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jersey-core-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jersey-json-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jersey-server-1.19.jar:/home/hdoop/hadoop-3.3.0//share/ hadoop/common/lib/jersey-servlet-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jettison-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-http-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-io-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-security-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-server-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-servlet-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-util-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-webapp-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jetty-xml-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jsch-0.1.54.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/json-smart-2.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jsp-api-2.1.jar:/home/hdoop/hadoop-3.3. 0//share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hdoop/hado op-3.3.0//share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/log4j-1.2.17.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/netty-3.10.5.Final.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/paranamer-2.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/re2j-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/slf4j-api-1.7.25.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/l ib/snappy-java-1.0.5.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/stax2-api-3.1.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/woodstox-core-5.0.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib/zookeeper-3.4.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/hadoop-common-3.3.0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/hadoop-common-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/hadoop-kms-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/hadoop-nfs-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/common/jdiff:/home/hdoop/hadoop-3.3.0//share/hadoop/common/lib:/home/hdoop/hadoop-3.3.0//share/hadoop/common/sources:/home/hdoop/hadoop-3.3.0//share/hadoop/common/webapps:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/animal-sniffer-annotations-1.17.jar:/home/hdoop/hadoop-3.3.0//share/hado op/hdfs/lib/asm-5.0.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/audience-annotations-0.5.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/checker-qual-2.5.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-codec-1.11.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-compress-1.18.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-io-2.5.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-lang3-3.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hdoop/hadoop-3.3.0/ /share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/commons-text-1.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/curator-client-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/curator-framework-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/curator-recipes-2.13.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/dnsjava-2.1.7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/error_prone_annotations-2.2.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/failureaccess-1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/gson-2.2.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/guava-27.0-jre.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/hadoop-annotations-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/hadoop-auth-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/home/hdoo p/hadoop-3.3.0//share/hadoop/hdfs/lib/httpclient-4.5.6.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/httpcore-4.4.10.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/j2objc-annotations-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-annotations-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-core-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-databind-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jcip-annotat ions-1.0-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-http-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-io-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-security-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-server-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-servlet-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-util-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-util-ajax-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-webapp-9.3 .24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jetty-xml-9.3.24.v20180605.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jsch-0.1.54.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/json-smart-2.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/ho me/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/netty-all-4.0.52.Final.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/okio- 1.6.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib/zookeeper-3.4.13.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-3.3.0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-client-3.3.0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-client-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-httpfs-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-native-client-3.3 .0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-nfs-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/jdiff:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/lib:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/sources:/home/hdoop/hadoop-3.3.0//share/hadoop/hdfs/webapps:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/lib/junit-4.11.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3. 3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0-tests.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-client-uploader-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/jdiff:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/lib:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/lib-examples:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/sources:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/y arn/lib/aopalliance-1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/bcpkix-jdk15on-1.60.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/bcprov-jdk15on-1.60.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/fst-2.50.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/guice-4.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/jackson-jaxrs-base-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.9.8.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/javax.inject-1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hdoop/hadoop -3.3.0//share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/objenesis-1.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/snakeyaml-1.16.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-api-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-client-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-common-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-registry-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/had oop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-common-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-router-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-tests-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-services-api-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-services-core-3.3.0.jar:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/hadoop-yarn-submarine-3.3.0.jar:/home/hdoop/ha doop-3.3.0//share/hadoop/yarn/lib:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/sources:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/test:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/timelineservice:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/webapps:/home/hdoop/hadoop-3.3.0//share/hadoop/yarn/yarn-service-examples:/home/hdoop/hadoop-3.3.0//share/hadoop/mapreduce/sources/hadoop-mapreduce-client-app-3.3.0-sources.jar:/home/hdoop/hadoop-3.3.0//hadoop-tools/hadoop-aws/target/hadoop-aws-3.3.0.jar:/home/hdoop/hadoop-3.3.0/hadoop-tools/hadoop-aws/target/hadoop-aws-3.3.0.jar:/home/hdoop/hadoop-3.3.0/hadoop-tools/hadoop-aws/target/lib/aws-java-sdk-bundle-1.11.563.jar:/home/hdoop/nutch/build/lib/commons-jexl3-3.1.jar:/home/hdoop/nutch/lib/*</value> > > </property> > > > <property> > > <name>yarn.app.mapreduce.am.resource.mb</name> > > <value>512</value> > > </property> > > > <property> > > <name>mapreduce.map.memory.mb</name> > > <value>256</value> > > </property> > > > <property> > > <name>mapreduce.reduce.memory.mb</name> > > <value>256</value> > > </property> > > > <!--from NutchHadoop Tutorial --> > > <property> > > <name>mapred.system.dir</name> > > <value>/home/hdoop/dfsdata/mapreduce/system</value> > > </property> > > > <property> > > <name>mapred.local.dir</name> > > <value>/home/hdoop/dfsdata/mapreduce/local</value> > > </property> > > > </configuration> > > > > workers > > hadoop02N <http://hadoop02.o7.castle.fm/>ame > > hadoop01N <http://hadoop01.o7.castle.fm/>ame > > > > yarn-site.xml > > <?xml version="1.0"?> > > <!-- > > Licensed under the Apache License, Version 2.0 (the "License"); > > you may not use this file except in compliance with the License. > > You may obtain a copy of the License at > > > http://www.apache.org/licenses/LICENSE-2.0 > > > Unless required by applicable law or agreed to in writing, software > > distributed under the License is distributed on an "AS IS" BASIS, > > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > > See the License for the specific language governing permissions and > > limitations under the License. See accompanying LICENSE file. > > --> > > <configuration> > > <property> > > <name>yarn.nodemanager.aux-services</name> > > <value>mapreduce_shuffle</value> > > </property> > > <property> > > <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> > > <value>org.apache.hadoop.mapred.ShuffleHandler</value> > > </property> > > <property> > > <name>yarn.resourcemanager.hostname</name> > > <value>IP_PLACEHOLDER</value> > > </property> > > <property> > > <name>yarn.acl.enable</name> > > <value>0</value> > > </property> > > <property> > > <name>yarn.nodemanager.env-whitelist</name> > > > <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> > > </property> > > > <!--<property> > > <name>yarn.application.classpath</name> > > > <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$USS_HOME/*,$USS_CONF</value> > > </property>--> > > > <property> > > <name>yarn.nodemanager.resource.memory-mb</name> > > <value>1536</value> > > </property> > > > <property> > > <name>yarn.scheduler.maximum-allocation-mb</name> > > <value>1536</value> > > </property> > > > <property> > > <name>yarn.scheduler.minimum-allocation-mb</name> > > <value>128</value> > > </property> > > > <property> > > <name>yarn.nodemanager.vmem-check-enabled</name> > > <value>false</value> > > </property> > > <property> > > <name>yarn.nodemanager.local-dirs</name> > > <value>${hadoop.tmp.dir}/nm-local-dir</value> > > </property> > > > </configuration> > > > On Thu, Jun 17, 2021 at 11:55 AM Clark Benham <cl...@thehive.ai> wrote: > > > > > Hi Sebastian, > > > > NUTCH_HOME=~/nutch; the local filesystem. I am using a plain, pre-built > > hadoop. > > There's no "mapreduce.job.dir" I can grep in Hadoop 3.2.1,3.3.0, or > > Nutch-1.18, 1.19, but mapreduce.job.hdfs-servers defaults to > > ${fs.defaultFS}, so s3a://temp-crawler in our case. > > The plugin loader doesn't appear to be able to read from s3 in nutch-1.18 > > with hadoop-3.2.1[1]. > > > > Using java & javac 11 with hadoop-3.3.0 downloaded and untared and a > > nutch-1.19 I built: > > I can run a mapreduce job on S3; and a Nutch job on hdfs, but running > > nutch on S3 still gives "URLNormalizer not found" with the plugin dir on > > the local filesystem or on S3a. > > > > How would you recommend I go about getting the plugin loader to read from > > other file systems? > > > > [1] I still get 'x point org.apache.nutch.net.URLNormalizer not found' > > (same stack trace as previous email) with > > `<name>plugin.folders</name> > > <value>s3a://temp-crawler/user/hdoop/nutch-plugins</value>` > > set in my nutch-site.xml while `hadoop fs -ls > > s3a://temp-crawler/user/hdoop/nutch-plugins` lists all the plugins as there. > > > > > > For posterity: > > I got hadoop-3.3.0 working with a S3 backend by: > > > > cd ~/hadoop-3.3.0 > > > > cp ./share/hadoop/tools/lib/hadoop-aws-3.3.0.jar ./share/hadoop/common/lib > > > > cp ./share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.563.jar > > ./share/hadoop/common/lib > > to solve "Class org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory not > > found" despite the class existing in > > ~/hadoop-3.3.0/share/hadoop/tools/lib/hadoop-aws-3.3.0.jar checking it's > > on the classpath with `hadoop classpath | tr ":" "\n" | grep > > share/hadoop/tools/lib/hadoop-aws-3.3.0.jar` as well as adding it to > > hadoop-env.sh. > > see > > https://stackoverflow.com/questions/58415928/spark-s3-error-java-lang-classnotfoundexception-class-org-apache-hadoop-f > > > > On Tue, Jun 15, 2021 at 2:01 AM Sebastian Nagel > > <wastl.na...@googlemail.com.invalid> wrote: > > > >> > The local file system? Or hdfs:// or even s3:// resp. s3a://? > >> > >> Also important: the value of "mapreduce.job.dir" - it's usually > >> on hdfs:// and I'm not sure whether the plugin loader is able to > >> read from other filesystems. At least, I haven't tried. > >> > >> > >> On 6/15/21 10:53 AM, Sebastian Nagel wrote: > >> > Hi Clark, > >> > > >> > sorry, I should read your mail until the end - you mentioned that > >> > you downgraded Nutch to run with JDK 8. > >> > > >> > Could you share to which filesystem does NUTCH_HOME point? > >> > The local file system? Or hdfs:// or even s3:// resp. s3a://? > >> > > >> > Best, > >> > Sebastian > >> > > >> > > >> > On 6/15/21 10:24 AM, Clark Benham wrote: > >> >> Hi, > >> >> > >> >> > >> >> I am trying to run Nutch-1.19 on hadoop-3.2.1 with an S3 > >> >> backend/filesystem; however I get an error ‘URLNormalizer class not > >> found’. > >> >> I have edited nutch-site.xml so this plugin should be included: > >> >> > >> >> <property> > >> >> > >> >> <name>plugin.includes</name> > >> >> > >> >> > >> >> > >> <value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)|mimetype-filter|urlnormalizer|urlnormalizer-basic|.*|nutch-extensionpoints</value> > >> > >> >> > >> >> > >> >> > >> >> </property> > >> >> > >> >> and then built on both nodes (I only have 2 machines). I’ve > >> successfully > >> >> run Nutch locally and in distributed mode using HDFS, and I’ve run a > >> >> mapreduce job with S3 as hadoop’s file system. > >> >> > >> >> > >> >> I thought it was possible nutch is not reading nutch-site.xml because I > >> >> resolve an error by setting the config through the cli, despite this > >> >> duplicating nutch-site.xml. > >> >> > >> >> The command: > >> >> > >> >> `hadoop jar $NUTCH_HOME/runtime/deploy/apache-nutch-1.19-SNAPSHOT.job > >> >> org.apache.nutch.fetcher.Fetcher > >> >> crawl/crawldb crawl/segments` > >> >> > >> >> throws > >> >> > >> >> `java.lang.IllegalArgumentException: Fetcher: No agents listed in ' > >> >> http.agent.name' property` > >> >> > >> >> while if I pass a value in for http.agent.name with > >> >> `-Dhttp.agent.name=myScrapper`, > >> >> (making the command `hadoop jar > >> >> $NUTCH_HOME/runtime/deploy/apache-nutch-1.19-SNAPSHOT.job > >> >> org.apache.nutch.fetcher.Fetcher > >> >> -Dhttp.agent.name=clark crawl/crawldb crawl/segments`), I get an > >> error > >> >> about there being no input path, which makes sense as I haven’t been > >> able > >> >> to generate any segments. > >> >> > >> >> > >> >> However this method of setting nutch config’s doesn’t work for > >> injecting > >> >> URLs; eg: > >> >> > >> >> `hadoop jar $NUTCH_HOME/runtime/deploy/apache-nutch-1.19-SNAPSHOT.job > >> >> org.apache.nutch.crawl.Injector > >> >> -Dplugin.includes=".*" crawl/crawldb urls` > >> >> > >> >> fails with the same “URLNormalizer” not found. > >> >> > >> >> > >> >> I tried copying the plugin dir to S3 and setting > >> >> <name>plugin.folders</name> to be a path on S3 without success. (I > >> expect > >> >> the plugin to be bundled with the .job so this step should be > >> unnecessary) > >> >> > >> >> > >> >> The full stack trace for `hadoop jar > >> >> $NUTCH_HOME/runtime/deploy/apache-nutch-1.19-SNAPSHOT.job > >> >> org.apache.nutch.crawl.Injector > >> >> crawl/crawldb urls`: > >> >> > >> >> SLF4J: Class path contains multiple SLF4J bindings. > >> >> > >> >> SLF4J: Found binding in > >> >> > >> [jar:file:/home/hdoop/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> >> > >> >> SLF4J: Found binding in > >> >> > >> [jar:file:/home/hdoop/apache-nutch-1.18/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> >> > >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > >> >> explanation. > >> >> > >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > >> >> > >> >> #Took out multiply Info messages > >> >> > >> >> 2021-06-15 07:06:07,842 INFO mapreduce.Job: Task Id : > >> >> attempt_1623740678244_0001_m_000001_0, Status : FAILED > >> >> > >> >> Error: java.lang.RuntimeException: x point > >> >> org.apache.nutch.net.URLNormalizer not found. > >> >> > >> >> at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:145) > >> >> > >> >> at > >> org.apache.nutch.crawl.Injector$InjectMapper.setup(Injector.java:139) > >> >> > >> >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > >> >> > >> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > >> >> > >> >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > >> >> > >> >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > >> >> > >> >> at java.security.AccessController.doPrivileged(Native Method) > >> >> > >> >> at javax.security.auth.Subject.doAs(Subject.java:422) > >> >> > >> >> at > >> >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > >> >> > >> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > >> >> > >> >> > >> >> #This error repeats 6 times total, 3 times for each node > >> >> > >> >> > >> >> 2021-06-15 07:06:26,035 INFO mapreduce.Job: map 100% reduce 100% > >> >> > >> >> 2021-06-15 07:06:29,067 INFO mapreduce.Job: Job job_1623740678244_0001 > >> >> failed with state FAILED due to: Task failed > >> >> task_1623740678244_0001_m_000001 > >> >> > >> >> Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 > >> >> killedReduces: 0 > >> >> > >> >> > >> >> 2021-06-15 07:06:29,190 INFO mapreduce.Job: Counters: 14 > >> >> > >> >> Job Counters > >> >> > >> >> Failed map tasks=7 > >> >> > >> >> Killed map tasks=1 > >> >> > >> >> Killed reduce tasks=1 > >> >> > >> >> Launched map tasks=8 > >> >> > >> >> Other local map tasks=6 > >> >> > >> >> Rack-local map tasks=2 > >> >> > >> >> Total time spent by all maps in occupied slots (ms)=63196 > >> >> > >> >> Total time spent by all reduces in occupied slots (ms)=0 > >> >> > >> >> Total time spent by all map tasks (ms)=31598 > >> >> > >> >> Total vcore-milliseconds taken by all map tasks=31598 > >> >> > >> >> Total megabyte-milliseconds taken by all map tasks=8089088 > >> >> > >> >> Map-Reduce Framework > >> >> > >> >> CPU time spent (ms)=0 > >> >> > >> >> Physical memory (bytes) snapshot=0 > >> >> > >> >> Virtual memory (bytes) snapshot=0 > >> >> > >> >> 2021-06-15 07:06:29,195 ERROR crawl.Injector: Injector job did not > >> succeed, > >> >> job status: FAILED, reason: Task failed > >> task_1623740678244_0001_m_000001 > >> >> > >> >> Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 > >> >> killedReduces: 0 > >> >> > >> >> > >> >> 2021-06-15 07:06:29,562 ERROR crawl.Injector: Injector: > >> >> java.lang.RuntimeException: Injector job did not succeed, job status: > >> >> FAILED, reason: Task failed task_1623740678244_0001_m_000001 > >> >> > >> >> Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 > >> >> killedReduces: 0 > >> >> > >> >> > >> >> at org.apache.nutch.crawl.Injector.inject(Injector.java:444) > >> >> > >> >> at org.apache.nutch.crawl.Injector.run(Injector.java:571) > >> >> > >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > >> >> > >> >> at org.apache.nutch.crawl.Injector.main(Injector.java:535) > >> >> > >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> >> > >> >> at > >> >> > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >> >> > >> >> at > >> >> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> >> > >> >> at java.lang.reflect.Method.invoke(Method.java:498) > >> >> > >> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:323) > >> >> > >> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:236) > >> >> > >> >> > >> >> > >> >> > >> >> P.S. > >> >> > >> >> I am using a downloaded hadoop-3.2.1; and the only odd thing about my > >> nutch > >> >> build is that I had to replace all instances of “javac.verion” with > >> >> “ant.java.version”; as the javac version was 11 to java’s 1.8 giving > >> the > >> >> error ‘javac invalid target release: 11’: > >> >> > >> >> grep -rl "javac.version" . --include "*.xml" | xargs sed -i > >> >> s^javac.version^ant.java.version^g > >> >> > >> >> grep -rl “ant.ant” . --include "*.xml"| xargs sed -i s^ant.ant.^ant.^g > >> >> > >> > > >> > >> >