Hi Isabelle,
thanks for asking. Yes, it's complex. But maybe I'm also a bit overcautious. Let
me explain...
If anybody runs a Hadoop cluster on Java 11, and Nutch is compiled for Java 17,
you cannot run it anymore, at least, not without extra work, for example,
dockerizing [1].
The Hadoop services (HDFS name and data node, YARN resource and node manager,
etc.) and the Nutch tasks all run in their own JVM. But JAVA_HOME and
parts of the class path are shared.
Hadoop distributions are usually conservative and many support Java 8 and 11,
few Java 17.
I know about it, Apache Bigtop [2] is still on Java 8 and 11. But while I might
be able to adapt Nutch, others may not.
I want to avoid that a forced upgrade to Java 17 breaks the Nutch setups on
Hadoop for users.
In local mode this isn't an issue: just install a second Java version.
> I've had occasion to run a few small crawls using the default embedded
> Hadoop, both with JVM 17 and 21.
Thanks, good to now.
> Actually, at my former employer, Nutch is running on Java 21
But is this locally or deployed on a Hadoop cluster?
> Otherwise, if it's compile-time, we don't have to wait for Hadoop to
> compile with Java 17.
Of course. But again: for distributed mode, the usual way is to compile Nutch
yourself: the job file in runtime/deploy includes the modified nutch-site.xml
and other configuration files. This adds additional complexity in documenting
deployment. And it would only help for some upgrades, e.g. JUnit 6 (NUTCH-3145),
others would require also a Java 17 runtime (index-geoip / NUTCH-3064).
Thanks for the feedback!
More is very welcome!
Especially, any notes ahead if an upgrade to Java 17 would break your setup.
Thanks!
Best,
Sebastian
[1]
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainers.html
[2] https://bigtop.apache.org/
On 2/4/26 15:52, Isabelle Giguere wrote:
Hi;
I'm trying to understand exactly what the issue is.
I've had occasion to run a few small crawls using the default embedded
Hadoop, both with JVM 17 and 21. Actually, at my former employer, Nutch is
running on Java 21, last I saw.
In a full production environment, if Nutch is using an external Hadoop, and
if Hadoop has difficulties with Java 17+ at runtime, then, Nutch and
Hadoop can each run on their own JVM.
Otherwise, if it's compile-time, we don't have to wait for Hadoop to
compile with Java 17.
Or am I completely off-track ?
Isabelle Giguère
Le mar. 3 févr. 2026 à 16:52, Sebastian Nagel <[email protected]>
a écrit :
Hi everybody,
the current Nutch development is ready for Java 17
with NUTCH-2971 fixed - thanks Isabelle!
By now Nutch does not require Java 17 at compile or
run time. Java 11 is still sufficient.
This is good because Hadoop still does not guarantee
full support of Java 17 [1].
However, staying compatible with Java 11 becomes a burden
because more and more dependencies require an upgrade to Java 17.
We have already two PRs open which are great improvements
(thanks to Lewis!) but would require Java 17:
- index-geoip NUTCH-3064 / PR #825 [2]
- JUnit 6 NUTCH-3145 / PR #883 [3]
We now have the following options for the next release (1.22):
1. stay on Java 11
2. require Java 17 at compile time, but compile using "-target 11"
to stay compatible with Java 11 at runtime
3. drop support for Java 11 and switch to Java 17
I'm leaning in favor of option 1 and try to release Nutch 1.22
during the next weeks. After the release go to option 3.
Please share your thoughts and opinions!
~Sebastian
[1] https://issues.apache.org/jira/browse/HADOOP-17177
[2] https://github.com/apache/nutch/pull/825
[3] https://github.com/apache/nutch/pull/883