I wanted to discuss what I am looking to do with alpine. As you may know I am a bit long winded but there is a narrative here about what I am trying to do with hadoop and packaging.
When I got into linux, RedHat at version 9?. split into Fedora and Red Hat Enterprise Linux. What emerged from this situation: Fedora was a test bed so Fedora 5 would be the bleeding edge that became RHEL 4. Also around that time CentOS became 'the' fork many businesses were completely OK running CentOS. There were a number of people doing extended packaging like DAG rpms. There were only a few 'stubborn' pieces of software that were tried hard to "needing" RHEL. Effectively you could find an RPM from all those places (RHEL,DAG,CENTOS) and they would 'generally' work together. SUSE and gentoo where there, Umbuntu came up, but none are my bread and butter so I can not speak there. We all know over the past few years "containers" are here. This isn't a unique opinion of mine, "There is a lot of "misguided" packaging into containers" like rebuilding a 9 GB image to change a 3 line config, We used to achieve "immutable deployments" by installing all the user software into /opt/hadoop and ONLY altering that by the process 'ansible'. I was in a regulated environment, and in those environments they are very serious OSS vulnerability scanning. Not just a scan once a quarter, or once a year. Not just a scan and "asking nicely" to try to clean it up. Constant scanning and asking for impact assessment AND timelines for remediation. (Note I see the owasp plugin is in hadoop trunk but it itself is versions behind I will send pr). The thing about these environments. 1) Less is more! 2) it is easier to constantly upgrade than to explain. Example. There is a CVE on the zookeeper in hadoop. If you read about the CVE it is only inside the "optional" admin server CVE-2025-58457: Insufficient Permission Check This vulnerability allows an *authorized* client with low-level privileges to execute sensitive snapshot and restore commands on the Admin Server without the required root (ALL) permissions. The primary risk is the disclosure of the cluster state via unauthorized snapshots. It takes more time to constantly read the sometimes cryptic CVE reports, and EXPLAIN to people that you are not affected then to keep patching! Remember lesson #1 (less is more). The problem of having 4GB of stuff in /usr/bin is that SOMETHING is always having a vulnerability and it is usually something you dont use! Bigger isnt better with hadoop either, my 80GB ssd has say 25GB free, but having a 5GB olverlayFS for the hadoop build starts choking down how many things I can test at once. minimr cluster assembly in the "lean" has a compounding effect etc. Onto alpine, sooo. in the enterprise they love the RedHat. It's a vendor. You can pay them! CYA! However RHEL isn't a container distro. It is a great distro, but so expansive the minimal install is maybe like 5GB ). I would am sure that you can pair it down but it is not the bread and butter. WIth the vendor 'situation' involving Centos. is it rocky is it alma? To me personally I am "divesting' 6GB "distros" mostly because of reason #1, too much stuff that isnt useful only to create vulnerabilities. What I am doing is trying to push down into containers. https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop/compositions/ha_rm_zk_pki_tls My goal is to service not the "every" possible hadoop like a cloudera manager can, only to give these archite type setups. HA YARN is an architype, ha NN with 3 journal nodes is an archetype. The downside of alpine is the MUSL it forces you to break lt forces of assumptions. However it finds other things "our find -l" when the container starts to list the directory is not portable to alpine. But its also forcing a revis of some c code which turns out not portable anyway. So I would ask the group, if I dont fall off a cliff and get the alpine support to a decent point can it be added as one of the "official" supported build envs.Like make this decision in like 4 months or so. Thanks, Edward
