I have been testing this all week, and a -1 until some very minor changes
go in.


   1. build the arm64 binaries with the same jar artifacts as the x86 one
   2. include ad8b6541117b HADOOP-18088. Replace log4j 1.x with reload4j.
   3. include 80b4bb68159c HADOOP-19084. Prune hadoop-common transitive
   dependencies


For #1 we have automation there in my client-validator module, which I have
moved to be a hadoop-managed project and tried to make more
manageable
https://github.com/apache/hadoop-release-support

This contains an ant project to perform a lot of the documented build
stages, including using SCP to copy down an x86 release tarball and make a
signed copy of this containing (locally built) arm artifacts.

Although that only works with my development environment (macbook m1 laptop
and remote ec2 server), it should be straightforward to make it more
flexible.

It also includes and tests a maven project which imports many of the
hadoop-* pom files and run some test with it; this caught some problems
with exported slf4j and log4j2 artifacts getting into the classpath. That
is: hadoop-common pulling in log4j 1.2 and 2.x bindings.

HADOOP-19084 fixes this; the build file now includes a target to scan the
dependencies and fail if "forbidden" artifacts are found. I have not been
able to stop logback ending on the transitive dependency list, but at least
there is only one slf4j there.

HADOOP-18088. Replace log4j 1.x with reload4j switches over to reload4j
while the move to v2 is still something we have to consider a WiP.

I have tried doing some other changes to the packaging this week
- creating a lean distro without the AWS SDK
- trying to get protobuf-2.5 out of yarn-api
However, I think it is too late to try applying patches this risky.

I Believe we should get the 3.4.0 release out for people to start playing
with while we rapidly iterate 3.4.1 release out with
- updated dependencies (where possible)
- separate "lean" and "full" installations, where "full" includes all the
cloud connectors and their dependencies; the default is lean and doesn't.
That will cut the default download size in half.
- critical issues which people who use the 3.4.0 release raise with us.

That is: a packaging and bugs release, with a minimal number of new
features.

I've created HADOOP-19087
<https://issues.apache.org/jira/browse/HADOOP-19087> to cover this,
I'm willing to get my hands dirty here -Shilun Fan and Xiaoqiao He have put
a lot of work on 3.4.0 and probably need other people to take up the work
for next release. Who else is willing to participate? (Yes Mukund, I have
you in mind too)

One thing I would like to visit is: what hadoop-tools modules can we cut?
Are rumen and hadoop-streaming being actively used? Or can we consider them
implicitly EOL and strip. Just think of the maintenance effort we would
save.

---

Incidentally, I have tested the arm stuff on my raspberry pi5 which is now
running 64 bit linux. I believe it is the first time we have qualified a
Hadoop release with the media player under someone's television.

On Thu, 15 Feb 2024 at 20:41, Mukund Madhav Thakur <mtha...@cloudera.com>
wrote:

> Thanks, Shilun for putting this together.
>
> Tried the below things and everything worked for me.
>
> validated checksum and gpg signature.
> compiled from source.
> Ran AWS integration tests.
> untar the binaries and able to access objects in S3 via hadoop fs commands.
> compiled gcs-connector successfully using the 3.4.0 version.
>
> qq: what is the difference between RC1 and RC2? apart from some extra
> patches.
>
>
>
> On Thu, Feb 15, 2024 at 10:58 AM slfan1989 <slfan1...@apache.org> wrote:
>
>> Thank you for explaining this part!
>>
>> hadoop-3.4.0-RC2 used the validate-hadoop-client-artifacts tool to
>> generate
>> the ARM tar package, which should meet expectations.
>>
>> We also look forward to other members helping to verify.
>>
>> Best Regards,
>> Shilun Fan.
>>
>> On Fri, Feb 16, 2024 at 12:22 AM Steve Loughran <ste...@cloudera.com>
>> wrote:
>>
>> >
>> >
>> > On Mon, 12 Feb 2024 at 15:32, slfan1989 <slfan1...@apache.org> wrote:
>> >
>> >>
>> >>
>> >> Note, because the arm64 binaries are built separately on a different
>> >> platform and JVM, their jar files may not match those of the x86
>> >> release -and therefore the maven artifacts. I don't think this is
>> >> an issue (the ASF actually releases source tarballs, the binaries are
>> >> there for help only, though with the maven repo that's a bit blurred).
>> >>
>> >> The only way to be consistent would actually untar the x86.tar.gz,
>> >> overwrite its binaries with the arm stuff, retar, sign and push out
>> >> for the vote.
>> >
>> >
>> >
>> > that's exactly what the "arm.release" target in my client-validator
>> does.
>> > builds an arm tar with the x86 binaries but the arm native libs, signs
>> it.
>> >
>> >
>> >
>> >> Even automating that would be risky.
>> >>
>> >>
>> > automating is the *only* way to do it; apache ant has everything needed
>> > for this including the ability to run gpg.
>> >
>> > we did this on the relevant 3.3.x releases and nobody has yet
>> complained...
>> >
>>
>

Reply via email to