Hadoop 3.4 doesn't use protobuf 2.5 anymore.
Our latest docker images are 1.2GB in size. If it leads to a smaller docker
image footprint I'm all for it. Maybe remove some "optional" packages from
the docker image also.

On Wed, Dec 17, 2025 at 7:33 AM Edward Capriolo <[email protected]>
wrote:

> Hello friends,
>
> I have packed up hadoop a number of ways over the years.
>
> Lately, since eveyone loves docker, I find my 80gb hard disk constantly
> filled by, bulky or bloated images.
>
> I have to force these bloated images to "hit the gym".
>
> https://hub.docker.com/u/ecapriolo
> I have a spark, Zeppelin, and livy running on alpine and not much more
> than the jre.
>
> I wanted to tackle hadoop core next.
>
> https://issues.apache.org/jira/browse/HADOOP-19756
>
> Few funny fake blockers.
> 1) musl and thr code in ticket above
> 2) the old 2.5.0 protobuf
> So many oss problems no one even bothers packaging that protoc version for
> 6 years
> 3) the rhel reliance on the nis libraries
>
> Next, I realize rhe hadoop "lean" package cant accommodate every case. But
> the lea  is like 500mb docs and 500mb jars :)
> Timeline server and libs form 150mb. Test jars maybe 100 more. The native
> libs outside libhadoop are 180mb. (If you are on alpine they are negligible
> anyway)
>
> See the rm -rfs here.
>
> https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop
>
> Anyway my goal is to have nice lean alpine based packages and more
> advanced helm charts mirroring things I have done in ansible.. 2 nn 3
> journal nodes setup.  2 rms 3 zk etc.
>

Reply via email to