Hi Clark,
This is a lot of information... thank you for compiling it all.
Ideally the version of Hadoop being used with Nutch should ALWAYS match the
hadoop binaries referenced in
https://github.com/apache/nutch/blob/master/ivy/ivy.xml. This way you wont run
into the classpath issues.
I would
Hi Clark,
thanks for summarizing this discussion and sharing the final configuration!
Good to know that it's possible to run Nutch on Hadoop using S3A without
using HDFS (no namenode/datanodes running).
Best,
Sebastian
Hi All,
Sebastian Helped fix my issue: using S3 as a backend I was able to get
nutch-1.19 working with pre-built hadoop-3.3.0 and java 11. There was an
oddity that nutch-1.19 had 11 hadoop 3.1.3 jars, eg.
hadoop-hdfs-3.1.3.jar, hadoop-yarn-api-3.1.3.jar... ; this made running
`hadoop version`
Hi Sebastian,
NUTCH_HOME=~/nutch; the local filesystem. I am using a plain, pre-built
hadoop.
There's no "mapreduce.job.dir" I can grep in Hadoop 3.2.1,3.3.0, or
Nutch-1.18, 1.19, but mapreduce.job.hdfs-servers defaults to
${fs.defaultFS}, so s3a://temp-crawler in our case.
The plugin loader
> The local file system? Or hdfs:// or even s3:// resp. s3a://?
Also important: the value of "mapreduce.job.dir" - it's usually
on hdfs:// and I'm not sure whether the plugin loader is able to
read from other filesystems. At least, I haven't tried.
On 6/15/21 10:53 AM, Sebastian Nagel wrote:
Hi Clark,
sorry, I should read your mail until the end - you mentioned that
you downgraded Nutch to run with JDK 8.
Could you share to which filesystem does NUTCH_HOME point?
The local file system? Or hdfs:// or even s3:// resp. s3a://?
Best,
Sebastian
On 6/15/21 10:24 AM, Clark Benham
Hi Clark,
the class URLNormalizer is not in a plugin - it's part of Nutch core and defines the interface for URL normalizer plugins. Looks like
there's something wrong fundamentally, not only with the plugins.
> I am trying to run Nutch-1.19 on hadoop-3.2.1 with an S3
Are you aware that the
7 matches
Mail list logo