Hadoop 3.4 doesn't use protobuf 2.5 anymore. Our latest docker images are 1.2GB in size. If it leads to a smaller docker image footprint I'm all for it. Maybe remove some "optional" packages from the docker image also.
On Wed, Dec 17, 2025 at 7:33 AM Edward Capriolo <[email protected]> wrote: > Hello friends, > > I have packed up hadoop a number of ways over the years. > > Lately, since eveyone loves docker, I find my 80gb hard disk constantly > filled by, bulky or bloated images. > > I have to force these bloated images to "hit the gym". > > https://hub.docker.com/u/ecapriolo > I have a spark, Zeppelin, and livy running on alpine and not much more > than the jre. > > I wanted to tackle hadoop core next. > > https://issues.apache.org/jira/browse/HADOOP-19756 > > Few funny fake blockers. > 1) musl and thr code in ticket above > 2) the old 2.5.0 protobuf > So many oss problems no one even bothers packaging that protoc version for > 6 years > 3) the rhel reliance on the nis libraries > > Next, I realize rhe hadoop "lean" package cant accommodate every case. But > the lea is like 500mb docs and 500mb jars :) > Timeline server and libs form 150mb. Test jars maybe 100 more. The native > libs outside libhadoop are 180mb. (If you are on alpine they are negligible > anyway) > > See the rm -rfs here. > > https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop > > Anyway my goal is to have nice lean alpine based packages and more > advanced helm charts mirroring things I have done in ansible.. 2 nn 3 > journal nodes setup. 2 rms 3 zk etc. >
