AW: [DISCUSS] Should we update our policies to include source provenance check

2024-04-02 Thread Christofer Dutz
Hi all,

I fully agree on this … and adding to sebb’s statement that additional files 
can happen even without malicious intent.
I have seen this several times, if for example maven releases are built 
directly from the main checkout and not from the release-plugin checking out 
the release commit hash and building in a clean directory.

But indeed, this has also been something worrying me, and I think it could 
possibly become important with all the CRA and PLD stuff coming our way.

Chris


Von: sebb 
Datum: Dienstag, 2. April 2024 um 09:14
An: Jarek Potiuk 
Cc: security-discuss@community.apache.org 
, Users 
Betreff: Re: [DISCUSS] Should we update our policies to include source 
provenance check
WARNING: this post mixes public and private lists.

In Commons reviewers are supposed to check that the source tarball
contents all match files from the tag in the vote.
The reason is mainly for provenance, and licensing, but it is not
unknown for spurious files to be accidentally added to the source
tarball.
Things can go wrong even without malicious intent.

So I agree that this is a vital part of the process, and should be
made explicit.

On Tue, 2 Apr 2024 at 07:52, Jarek Potiuk  wrote:
>
> Following some of the learnings from the CVE-2024-3094 (xz backdoor) and a 
> few resulting discussions. I would like to start a discussion on that very 
> specific topic:
>
> TL;DR; I think that we currently do not explicitly state the requirement of 
> verifying if the release manager has not tampered with the sources when 
> preparing the source package - and I believe we should be more explicit about 
> it and require from PMC members to do such verification.
>
> As explained in [1] - there were two important triggers for the CVE to happen:
>
> a) the attacker was able to gain trust, become a maintainer and release 
> manager
> b) they submitted test binaries to the repository of xv that contained 
> malicious code
> c) acting as release manager - they modified the official source tar-ball 
> packages of xz to contain a modified Makefile that turn anyone using official 
> source-tar-ball packages to produce a malicious version of the xz library 
> (that malicious Makefile had never been part of the source repository, it's 
> not been reviewed nor approved by anyone).
>
> When I look at requirements explained in our release policy, I think this  
> kind of scenario (especially point c) is not something our release policies 
> protect us against, because we have no requirement to check provenance of the 
> source code in the released package.
>
> Or at least I cannot find it in neither release policy [2] nor distribution 
> policy [3].
>
> From [2]:
>
> > Before casting +1 binding votes, individuals are REQUIRED to download all 
> > signed source code packages onto their own hardware, verify that they meet 
> > all requirements of ASF policy on releases as described below, validate all 
> > cryptographic signatures, compile as provided, and test the result on their 
> > own platform.
>
> Even if we assume such a check is part of "meet all requirements of ASF 
> policy on releases" - there is no "check if the sources in the package have 
> not been modified vs. source repository" anywhere in the policies as far as I 
> can see.
>
> From earlier discussions - many of us think that verifying whether the 
> sources in the "source" package contain the same sources as ones stored in 
> our source repositories is the most important part of such verification, but 
> - somewhat to my surprise - it has not been explicitly stated in our 
> policies. And I think it should be an important gate to have PMC members to 
> be REQUIRED to verify that. That could be done in whatever way is appropriate 
> for the project - it could be just comparing sources with git, or having 
> reproducible packages that PMC members can build and compare for binary 
> identity if the project supports it.
>
> But similarly to comparing cryptographic signatures, possibly we should 
> explicitly state that this should be a mandatory check. And maybe we have the 
> chance to use the CVE-2024-3094 as an opportunity to remind/advocate it and 
> explain what could happen if this step is missing when our source packages 
> are released?
>
> I think the way it's stated, a malicious release manager could do a similar 
> package modification as xz release manager did and we could have missed it. 
> Our policies on release do not have explicit gates protecting against this - 
> so PMC members explicitly give +1 - following the release policy pretty 
> rigorously, could have not realise the malicious release manager did such
>
> Just to give an example from the past Airflow releases - when I came to the 
> project, I've learned how it works, and the release process was very 
> rigorously followed, including licences, signatures, etc. and we even pulled 
> a few releases when those were not met. But it's only a few years later when 
> I became a release manager and 

AW: Reproducible builds [Airflow] -> done

2024-01-14 Thread Christofer Dutz
In PLC4X we’re also working on enabling fully reproducible builds.
As PLC4X contains Java, C, C#, Go, Python, … it was a bit of a challenge, but I 
think we’ll cross the finishing lines with our upcoming release.

We basically gave up on the idea of configuring our tooling to produce the same 
output on every Java Version, Operating-system and CPU architecture,
we simply defined a reference platform in form of a Dockerfile and use 
docker-compose to “ship the reference build machine”.

So the idea is to execute all release operations in this reference build 
machine and to also use it to validate releases.

The building and staging of releases has already been finished, still need to 
finish some of the missing parts, but hopeful that we’ll reach the final goal 
soon.

Chris

Von: Gary Gregory 
Datum: Sonntag, 14. Januar 2024 um 20:28
An: security-discuss@community.apache.org 

Cc: builds 
Betreff: Re: Reproducible builds [Airflow] -> done
Congratulations Jarkek :-)

I was glad to see a reference to
https://maven.apache.org/guides/mini/guide-reproducible-builds.html

I hope we can keep that Maven page up-to-date with whatever comes up so we
only have to look in one place ; -)

Gary

On Sun, Jan 14, 2024, 2:06 PM Jarek Potiuk  wrote:

> Hey everyone,
>
> I just wanted to share a little accomplishment I (mostly) implemented in
> Airflow - I just merged the last PR to get fully reproducible builds for
> all thePython artifacts we produce and publish in downloads.apache.org
> (python whl, sdist packages, source tarballs).
>
> All our 90 or so artifacts are now fully reproducible and we check
> reproducibility of them as a mandatory step of PMC verification when voting
> the releases. Initially I thought it's not THAT needed for us in the Python
> world, but I got the "let's be reproducible" bug implanted at the
> "reproducible builds" talk at the ApacheCon in Halifax by Hervé Boutemy and
> it stuck - until I got it completed.
>
> And yeah. It simplified quite a lot our artifact verification and made our
> process much more robust.
>
> Arnout just created this page
> https://cwiki.apache.org/confluence/display/SECURITY/Reproducible+Builds
> where we might gather some notes and guidelines around reproducibility -
> and I added some Python notes and links to our small snippets of making the
> packages "nicer" reproducible. For example we have now pretty accurate -
> yet reproducible - dates in the packages as we store source-date-epoch in
> our repository and bump it automatically as part of our release preparation
> process. Might be a good idea if others keep their notes there as well
> to share experiences.
>
> As a side note (and maybe sparking a bit of a Java/Maven vs. Python
> battle), I used to have a bit of an inferiority complex when comparing the
> build system state in Python - where we did not have so good standards and
> tooling was proliferated and complex. But as part of preparation there I
> had to basically move Airflow to the modern packaging world of Python (we
> accumulated quite some tech debt and had custom solutions there) but I
> learned that with the approach of separate build frontend and build
> backends, based on multiple PEP-standards, Python quite leapfrogged the
> Java world IMHO. It's a bit of a marvel of what the Python Packaging team
> accomplished in the last few years when it comes to standard adoption and
> tooling. I am honestly quite impressed with it, and it's just the beginning
> to be honest. And it makes our contributor's life so much easier - where
> they can stick to whatever frontend they like (Hatch, poetry, flit, pip,
> )  and it seamlessly works with a nicely integrated and customizable
> build backend of our choice (hatchling in this case).
>
> Looking forward to other stories there in the future.
>
> J.
>