CC user@f.a.o Is anyone aware of something that blocks us from doing the upgrade?
D. On Tue, Dec 21, 2021 at 5:50 PM David Morávek <david.mora...@gmail.com> wrote: > Hi Martijn, > > from person experience, most Hadoop users are lagging behind the release > lines by a lot, because upgrading a Hadoop cluster is not really a simply > task to achieve. I think for now, we can stay a bit conservative, nothing > blocks us for using 2.8.5 as we don't use any "newer" APIs in the code. > > As for Till's concern, we can still wrap the reflection based logic, to be > skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do > now. > > D. > > > On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <mart...@ververica.com> > wrote: > >> Hi David, >> >> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is >> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1] >> >> Best regards, >> >> Martijn >> >> [1] >> >> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines >> >> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <trohrm...@apache.org> wrote: >> >> > Hi David, >> > >> > I think we haven't updated our Hadoop dependencies in a long time. >> Hence, >> > it is probably time to do so. So +1 for upgrading to the latest patch >> > release. >> > >> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I >> > don't see a problem with dropping support for pre-bundled Hadoop >> versions < >> > 2.8. This could indeed help us decrease our build matrix a bit and, >> thus, >> > saving some build time. >> > >> > Concerning simplifying our code base to get rid of reflection logic >> etc. we >> > still might have to add a safeguard for features that are not supported >> by >> > earlier versions. According to the docs >> > >> > > YARN applications that attempt to use new APIs (including new fields >> in >> > data structures) that have not yet been deployed to the cluster can >> expect >> > link exceptions >> > >> > we can see link exceptions. We could get around this by saying that >> Flink >> > no longer supports Hadoop < 2.8. But this should be checked with our >> users >> > on the user ML at least. >> > >> > Cheers, >> > Till >> > >> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <d...@apache.org> wrote: >> > >> > > Hi, >> > > >> > > I'd like to start a discussion about upgrading a minimal Hadoop >> version >> > > that Flink supports. >> > > >> > > Even though the default value for `hadoop.version` property is set to >> > > 2.8.3, we're still ensuring both runtime and compile compatibility >> with >> > > Hadoop 2.4.x with the scheduled pipeline[1]. >> > > >> > > Here is list of dates of the latest releases for each minor version >> up to >> > > 2.8.x >> > > >> > > - Hadoop 2.4.1: Last commit on 6/30/2014 >> > > - Hadoop 2.5.2: Last commit on 11/15/2014 >> > > - Hadoop 2.6.5: Last commit on 10/11/2016 >> > > - Hadoop 2.7.7: Last commit on 7/18/2018 >> > > - Hadoop 2.8.5: Last commit on 9/8/2018 >> > > >> > > Since then there were two more minor releases in 2.x branch and four >> more >> > > minor releases in 3.x branch. >> > > >> > > Supporting the older version involves reflection-based "hacks" for >> > > supporting multiple versions. >> > > >> > > My proposal would be changing the minimum supported version *to >> 2.8.5*. >> > > This should simplify the hadoop related codebase and simplify the CI >> > build >> > > infrastructure as we won't have to test for the older versions. >> > > >> > > Please note that this only involves a minimal *client side* >> > compatibility. >> > > The wire protocol should remain compatible with earlier versions [2], >> so >> > we >> > > should be able to talk with any servers in 2.x major branch. >> > > >> > > One small note for the 2.8.x branch, some of the classes we need are >> only >> > > available in 2.8.4 version and above, but I'm not sure we should take >> an >> > > eventual need for upgrading a patch version into consideration here, >> > > because both 2.8.4 and 2.8.5 are pretty old. >> > > >> > > WDYT, is it already time to upgrade? Looking forward for any thoughts >> on >> > > the topic! >> > > >> > > [1] >> > > >> > > >> > >> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123 >> > > [2] >> > > >> > > >> > >> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility >> > > >> > > Best, >> > > D. >> > > >> > >> >