Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Jakub Zelenka
Hi,

On Tue, Apr 2, 2024 at 7:14 PM Stanislav Malyshev 
wrote:

> Hi!
>
> That is something PHP is missing atm, no one can verify the build process
>> for releases.
>>
>
> Yes that's what I was suggesting. This should be done by RM. In that way,
> the RM becomes more someone that verifies the build and not the actual
> person that provides the build.
>
> I'm not sure though how the RM can really verify it. I mean, we have the
> tar blob that comes from the git repo - which we assume is legit. We also
> have some files that aren't in the repo. If RM builds them by themselves
> then the question comes up what if RM's environment is compromised and
> something bad is injected. If RM receives the files from outside source,
> how the RM verifies they are genuine?  I don't think reading through the
> whole "configure" file and verifying it's not bad is realistic for any
> person. And from what I understand, "configure" and such are quite
> environment-dependant, so you can't just have a standard hash to compare
> to. You can't have the RM to just run "buildconf" again and do hash check
> because they may get different bits than the ones coming from the outside,
> like CI. I dunno, maybe if we had some kind of Docker image for generating
> it that would produce reproducible result, that'd be possible? Otherwise I
> am still not sure how the verification procedure looks like.
>

Yeah as I already noted that it needs to be reproducible so the RM would
need to have exactly the same version of all build tools as used in CI. I
think the only option would be to use Docker image for that. We could then
use the same image in CI (job container). In such way we should be able to
implement the same process (there might some extra bits to do but I think
it should be doable in general). We could potentially store the produced
hashes to some CI artifact and possibly also make it available from the
downloads server (once downloaded from CI) so the RM could have a script
that just automatically compare all hashes. So the ideal scenario would be
that RM just runs a command that will do all for them.


> Right now as I understand we're simply trusting the RM that they have
> uncompromised environment and third parties have no way to verify it's the
> case. But I guess it's time we do better?
>

Yes exactly that. Currently the RM can change the build as they want so if
they are compromised, then we might have the same issue that happened to XZ.

Regards

Jakub


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Stanislav Malyshev

Hi!



That is something PHP is missing atm, no one can verify the build
process for releases.


Yes that's what I was suggesting. This should be done by RM. In that 
way, the RM becomes more someone that verifies the build and not the 
actual person that provides the build.


I'm not sure though how the RM can really verify it. I mean, we have the 
tar blob that comes from the git repo - which we assume is legit. We 
also have some files that aren't in the repo. If RM builds them by 
themselves then the question comes up what if RM's environment is 
compromised and something bad is injected. If RM receives the files from 
outside source, how the RM verifies they are genuine?  I don't think 
reading through the whole "configure" file and verifying it's not bad is 
realistic for any person. And from what I understand, "configure" and 
such are quite environment-dependant, so you can't just have a standard 
hash to compare to. You can't have the RM to just run "buildconf" again 
and do hash check because they may get different bits than the ones 
coming from the outside, like CI. I dunno, maybe if we had some kind of 
Docker image for generating it that would produce reproducible result, 
that'd be possible? Otherwise I am still not sure how the verification 
procedure looks like.


Right now as I understand we're simply trusting the RM that they have 
uncompromised environment and third parties have no way to verify it's 
the case. But I guess it's time we do better?


Thanks,

Stas


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Olle Härstedt
internals+unsubscr...@lists.php.net -  550 5.7.1 Looks like spam to me.

Can't unsub...?

Den tis 2 apr. 2024 kl 16:46 skrev Jakub Zelenka :

> On Tue, Apr 2, 2024 at 3:35 PM tag Knife  wrote:
>
>>
>> On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka  wrote:
>>
>>> We will still need RM to sign the build so ideally we should make it
>>> reproducible so RM can verify that CI produced expected build and then sign
>>> it and just upload the signatures (not sure if we actually need signature
>>> uploaded or if they are used just in announcements).
>>>
>>> I think this should then prevent compromise of the RM and CI unless CI
>>> is compromised by RM, of course, but that should be very unlikely.
>>>
>>> Regards
>>>
>>> Jakub
>>>
>>>
>> On the side of the CI being compromised, this does happen, typically with
>> authed
>> private hosted CI, like jenkins. But if its open and accessible to
>> everyone to monitor, such
>> as github actions, everyone can monitor and audit the build logs to
>> verify the commands
>> ran and nothing unexpected happened during build.
>>
>> That is something PHP is missing atm, no one can verify the build process
>> for releases.
>>
>
> Yes that's what I was suggesting. This should be done by RM. In that way,
> the RM becomes more someone that verifies the build and not the actual
> person that provides the build.
>
> Regards
>
> Jakub
>
>
>


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Jakub Zelenka
On Tue, Apr 2, 2024 at 3:35 PM tag Knife  wrote:

>
> On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka  wrote:
>
>> We will still need RM to sign the build so ideally we should make it
>> reproducible so RM can verify that CI produced expected build and then sign
>> it and just upload the signatures (not sure if we actually need signature
>> uploaded or if they are used just in announcements).
>>
>> I think this should then prevent compromise of the RM and CI unless CI is
>> compromised by RM, of course, but that should be very unlikely.
>>
>> Regards
>>
>> Jakub
>>
>>
> On the side of the CI being compromised, this does happen, typically with
> authed
> private hosted CI, like jenkins. But if its open and accessible to
> everyone to monitor, such
> as github actions, everyone can monitor and audit the build logs to verify
> the commands
> ran and nothing unexpected happened during build.
>
> That is something PHP is missing atm, no one can verify the build process
> for releases.
>

Yes that's what I was suggesting. This should be done by RM. In that way,
the RM becomes more someone that verifies the build and not the actual
person that provides the build.

Regards

Jakub


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread tag Knife
On Tue, 2 Apr 2024 at 14:53, Jakub Zelenka  wrote:

> We will still need RM to sign the build so ideally we should make it
> reproducible so RM can verify that CI produced expected build and then sign
> it and just upload the signatures (not sure if we actually need signature
> uploaded or if they are used just in announcements).
>
> I think this should then prevent compromise of the RM and CI unless CI is
> compromised by RM, of course, but that should be very unlikely.
>
> Regards
>
> Jakub
>
>
On the side of the CI being compromised, this does happen, typically with
authed
private hosted CI, like jenkins. But if its open and accessible to everyone
to monitor, such
as github actions, everyone can monitor and audit the build logs to verify
the commands
ran and nothing unexpected happened during build.

That is something PHP is missing atm, no one can verify the build process
for releases.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Jakub Zelenka
Hi,

On Tue, Apr 2, 2024 at 2:36 PM Derick Rethans  wrote:

> On Sat, 30 Mar 2024, Jakub Zelenka wrote:
>
> > On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta 
> wrote:
> > >
> > > I understand that the XZ project had signed releases too: that still
> > > means that downstream consumers would need to trust the release
> > > managers anyway, and reproduce the whole chain themselves.
> > >
> > > I suppose that's part of OP's concern.
> > >
> > I agree that compromised RM is a problem that we should look into.
> >
> > We have been actually already discussing something similar. I have
> > been thinking about it and it could be potentially used for all
> > builds. The idea is that we would setup worklfow on CI that would run
> > on tag push and it would call (authenticated https request)
> > downloads.php.net server that could do the actual build, sign them and
> > return the hashes to the CI job which would display them and do extra
> > verification (probably its own build to verify that download server
> > work as expected).
>
> ...
>
> > It needs more thinking to iron out all details and make sure it is a
> > secure but I think it would be something worth to look at.
>
> I don't mind coming up with an automated way, but we probably should not
> use the *downloads* server. All it does is serve files. It has no
> compiler or anything else. It's a storage optimised instance with little
> CPU.
>
>
Yeah I agree. I originally thought that it would be good to do it on our
own server so we can possibly sign it there as well but after thinking
about it I rejected that signing idea so there's really no point to do it
on our own server.


> On CI we already test the builds, what does stop us from also just
> having it make the tarball and attach it as an artefact? We can then
> setup somethin gon the downloads server to pull these artefacts. In
> fact, this is exactly what we're already hoping to do for Windows
> downloads too. Having it all in one place is probably even better (and
> easier).
>
> Of course, having CI make the tarballs means we need to trust that CI
> isn't compromised ;-).
>

We will still need RM to sign the build so ideally we should make it
reproducible so RM can verify that CI produced expected build and then sign
it and just upload the signatures (not sure if we actually need signature
uploaded or if they are used just in announcements).

I think this should then prevent compromise of the RM and CI unless CI is
compromised by RM, of course, but that should be very unlikely.

Regards

Jakub


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-02 Thread Derick Rethans
On Sat, 30 Mar 2024, Jakub Zelenka wrote:

> On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta  wrote:
> >
> > I understand that the XZ project had signed releases too: that still 
> > means that downstream consumers would need to trust the release 
> > managers anyway, and reproduce the whole chain themselves.
> >
> > I suppose that's part of OP's concern.
> >
> I agree that compromised RM is a problem that we should look into.
> 
> We have been actually already discussing something similar. I have 
> been thinking about it and it could be potentially used for all 
> builds. The idea is that we would setup worklfow on CI that would run 
> on tag push and it would call (authenticated https request) 
> downloads.php.net server that could do the actual build, sign them and 
> return the hashes to the CI job which would display them and do extra 
> verification (probably its own build to verify that download server 
> work as expected).

...

> It needs more thinking to iron out all details and make sure it is a 
> secure but I think it would be something worth to look at.

I don't mind coming up with an automated way, but we probably should not 
use the *downloads* server. All it does is serve files. It has no 
compiler or anything else. It's a storage optimised instance with little 
CPU.

On CI we already test the builds, what does stop us from also just 
having it make the tarball and attach it as an artefact? We can then 
setup somethin gon the downloads server to pull these artefacts. In 
fact, this is exactly what we're already hoping to do for Windows 
downloads too. Having it all in one place is probably even better (and 
easier).

Of course, having CI make the tarballs means we need to trust that CI 
isn't compromised ;-).

cheers,
Derick

-- 
https://derickrethans.nl | https://xdebug.org | https://dram.io

Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support

mastodon: @derickr@phpc.social @xdebug@phpc.social

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-04-01 Thread Robert Landers
On Mon, Apr 1, 2024 at 1:53 AM Ben Ramsey  wrote:
>
> > On Mar 31, 2024, at 11:08, Robert Landers  wrote:
> >
> > There are probably multiple parties that require trust: the people
> > hosting the CI servers, the people with access to the CI servers, the
> > RM, and maybe more that I can't think of right now.
> >
> > One option would be to have
> >
> > - CI push the code + generated files to a git-branch `php-8.3-built`
> > (or something) so that changes can be reviewed, along with the
> > tarball.
> > - CI signs the commit and tarball.
> > - RM checks out commit and, also signs the tarball, then does a git
> > commit --amend --signoff and "blesses" the commit
> > - RM releases tarball
>
>
> When I was considering this and created a PR that followed these steps, I 
> discussed the process with folks from other open source communities, notably 
> the Apache Software Foundation community, since some of their projects follow 
> similar processes. The notion of automating the build and signing it on a 
> remote machine, only to be inspected and signed again on the release 
> manager’s machine was outright rejected by everyone. The machine where it is 
> signed by the RM should be the machine where it is built, according to 
> everyone I spoke with.
>
> As it stands right now, if we build the tarball on a remote machine (in CI), 
> and then the RM wants to compare it and build it locally, the hashes on those 
> tarballs will be different because we can’t guarantee reproducible builds. If 
> we could guarantee reproducible builds, then maybe this process could work, 
> but it would still require the RM to build it locally from the source tag in 
> order to trust and verify that nothing sneaked in on the CI machine.
>
> Cheers,
> Ben
>

I think the big point is to store the generated files in git for CI
builds. To verify the tarball is that commit, checkout the branch and
untar the file, there should be no changes, git clean should result in
no removed files, etc. This would make injecting malicious code
visible, at the very least. Whether someone catches it and actually
reviews the generated files is a different question. But if we wanted
something that is better than nothing... it's a pretty simple
solution.

Reproducible builds is an orthogonal but related problem.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-31 Thread Ben Ramsey
> On Mar 31, 2024, at 11:08, Robert Landers  wrote:
> 
> There are probably multiple parties that require trust: the people
> hosting the CI servers, the people with access to the CI servers, the
> RM, and maybe more that I can't think of right now.
> 
> One option would be to have
> 
> - CI push the code + generated files to a git-branch `php-8.3-built`
> (or something) so that changes can be reviewed, along with the
> tarball.
> - CI signs the commit and tarball.
> - RM checks out commit and, also signs the tarball, then does a git
> commit --amend --signoff and "blesses" the commit
> - RM releases tarball


When I was considering this and created a PR that followed these steps, I 
discussed the process with folks from other open source communities, notably 
the Apache Software Foundation community, since some of their projects follow 
similar processes. The notion of automating the build and signing it on a 
remote machine, only to be inspected and signed again on the release manager’s 
machine was outright rejected by everyone. The machine where it is signed by 
the RM should be the machine where it is built, according to everyone I spoke 
with.

As it stands right now, if we build the tarball on a remote machine (in CI), 
and then the RM wants to compare it and build it locally, the hashes on those 
tarballs will be different because we can’t guarantee reproducible builds. If 
we could guarantee reproducible builds, then maybe this process could work, but 
it would still require the RM to build it locally from the source tag in order 
to trust and verify that nothing sneaked in on the CI machine.

Cheers,
Ben



signature.asc
Description: Message signed with OpenPGP


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-31 Thread Robert Landers
On Sun, Mar 31, 2024 at 5:26 PM Christian Schneider
 wrote:
>
> Am 30.03.2024 um 16:35 schrieb Daniil Gentili :
> >> That would break lots of tools as it requires extra dependencies so it is 
> >> not something that would could in stable versions.
> > Btw, I do not believe that "it would require end users to install autotools 
> > and bison in order to compile PHP from tarballs" is valid reason to delay 
> > the patching of a serious attack vector ASAP.
>
> I agree with Jakub that removing configure would just shift the problem, not 
> solve it, while at the same time puts a new burden on people compiling PHP 
> from downloaded archives.
>
> But my main question is: I fail to see the difference whether I plant my 
> malicious code in configure, configure.ac or *.c: Someone has to review the 
> changes and notice the problem. And we have to trust the RMs. What am I 
> missing?
>
> Regards,
> - Chris

There are probably multiple parties that require trust: the people
hosting the CI servers, the people with access to the CI servers, the
RM, and maybe more that I can't think of right now.

One option would be to have

- CI push the code + generated files to a git-branch `php-8.3-built`
(or something) so that changes can be reviewed, along with the
tarball.
- CI signs the commit and tarball.
- RM checks out commit and, also signs the tarball, then does a git
commit --amend --signoff and "blesses" the commit
- RM releases tarball


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-31 Thread Rowan Tommins [IMSoP]

On 31/03/2024 14:53, Christian Schneider wrote:

But my main question is: I fail to see the difference whether I plant my
  malicious code in configure, configure.ac or *.c: Someone has to review
  the changes and notice the problem. And we have to trust the RMs. What
am I missing?



As I understand it, the attack being discussed involved*code that was never 
committed to version control*. The bulk of the payload was committed in fake 
binary test artifacts, which are unlikely to be inspected but harmless by 
themselves; but the trigger to incorporate it into the binary was 
added*manually*  in between the automated build and producing the signed 
release archive.

So the theory is that if there's no human involved in that process, there is no 
way for a human to introduce a malicious change at that step. An exploit would 
need to be introduced somewhere in version controlled, human-readable, code; 
giving extra chances for it to be detected.


On 30/03/2024 18:24, Jakub Zelenka wrote:
Do you think it would be different if the change happened in the 
distributed source file instead? I mean you could still modify tarball 
of the distributed file (e.g. hide somewhere in configure.ac or in our 
case more easily in less visible files like various Makefile.frag and 
similar). The only thing that you get by using just VCS files is that 
people could hash the distributed content of the files and compare it 
with the hash of the VCS files but does anyone do this sort of 
verification?



We already use a version control system built entirely on comparing hashes of source 
files. So given a signed tarball that claimed to match the content of a signed tag, any 
user can trivially check out the tag, expand the tarball, and run "git diff" to 
detect any anomalies.

The question of who would do that in practice is a valid one, and something 
that I'm sure has been discussed elsewhere regarding reproducible binary builds.



On 30/03/2024 15:35, Daniil Gentili wrote:
Btw, I do not believe that "it would require end users to install 
autotools and bison in order to compile PHP from tarballs" is valid 
reason to delay the patching of a serious attack vector ASAP.



As is always the case, there is a trade-off between security and 
convenience - in this case, distributing something that's usable without 
large amounts of extra tooling (including, for some generated files, a 
copy of PHP itself), vs distributing something that is 100% reviewable 
by humans.


Ultimately, 99.999% of users are not going to compile their own copy of 
PHP from source; they are going to trust some chain of providers to take 
the source, perform all the necessary build steps, and produce a binary. 
Removing generated files from the tarballs doesn't eliminate that need 
for trust, it just shifts more of it to organisations like Debian and 
RedHat; and maybe that's a valid aim, because those organisations have 
more resources than us to build appropriate processes.


Making things reproducible aims to attack the same problem from a 
different angle: rather than placing more trust in one part of the 
chain, it allows multiple parallel chains, which should all give the 
same result. If builds from different sources start showing unexplained 
differences, it can be flagged automatically.



Regards,

--
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-31 Thread Christian Schneider
Am 30.03.2024 um 16:35 schrieb Daniil Gentili :
>> That would break lots of tools as it requires extra dependencies so it is 
>> not something that would could in stable versions.
> Btw, I do not believe that "it would require end users to install autotools 
> and bison in order to compile PHP from tarballs" is valid reason to delay the 
> patching of a serious attack vector ASAP.

I agree with Jakub that removing configure would just shift the problem, not 
solve it, while at the same time puts a new burden on people compiling PHP from 
downloaded archives.

But my main question is: I fail to see the difference whether I plant my 
malicious code in configure, configure.ac or *.c: Someone has to review the 
changes and notice the problem. And we have to trust the RMs. What am I missing?

Regards,
- Chris


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Jakub Zelenka
On Sat, Mar 30, 2024 at 5:46 PM Ben Ramsey  wrote:

> On Mar 30, 2024, at 07:03, Jakub Zelenka  wrote:
>
> 
> Hi,
>
> On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta  wrote:
>
>>
>>
>> On Sat, 30 Mar 2024, 05:19 Ben Ramsey,  wrote:
>>
>>> On Mar 29, 2024, at 20:20, Bob Weinand  wrote:
>>>
>>> 
>>> On 29.3.2024 23:31:26, Daniil Gentili wrote:
>>>
>>> In light of the recent supply chain attack in xz/lzma, leading to a
>>> backdoor in openSSH (
>>> https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe
>>> that it would be a good idea to remove the huge attack surface offered by
>>> the pre-generated autoconf build scripts and lexers, offered in the release
>>> tarballs.
>>>
>>> In particular, the xz supply chain attack injected the exploit with a
>>> few obfuscated lines, manually added to the end of the pre-generated
>>> configure script, that was only bundled in the tarballs.
>>>
>>> Even if the exploits themselves were committed to the repo in the form
>>> of test files, the code that actually injected the exploit in the library
>>> was not committed to the repo, and was only present in the pre-generated
>>> configure script in the tarball: this injection mode makes sense, as extra
>>> files in the tarball not present in the git repo would raise suspicions,
>>> but machine-generated configure scripts containing hundreds of thousands of
>>> lines of code not present in the upstream VCS are the norm, and are usually
>>> not checked before execution.
>>>
>>> Specifically in the case of PHP, along from the configure script, the
>>> tarball also bundles generated lexer files which contain actual C code,
>>> which is an additional attack vector, i.e. here's the diff between the
>>> tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
>>>
>>> ```
>>> ~ $ diff -r php-8.3.4 php-src -q
>>> Only in php-src:
>>> .git  Files
>>> php-8.3.4/NEWS and php-src/NEWS differ   Files
>>> php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only
>>> in php-8.3.4/Zend: zend_ini_parser.c
>>> Only in php-8.3.4/Zend: zend_ini_parser.h
>>> Only in php-8.3.4/Zend:
>>> zend_ini_parser.output Only in php-8.3.4/Zend:
>>> zend_ini_scanner.c
>>> Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
>>> Only in php-8.3.4/Zend:
>>> zend_language_parser.c Only in php-8.3.4/Zend:
>>> zend_language_parser.h Only in php-8.3.4/Zend:
>>> zend_language_parser.output
>>> Only in php-8.3.4/Zend: zend_language_scanner.c
>>> Only in php-8.3.4/Zend:
>>> zend_language_scanner_defs.h   Only in php-8.3.4:
>>> configure   Files php-8.3.4/
>>> configure.ac and php-src/configure.ac differ   Only in
>>> php-8.3.4/ext/json: json_parser.tab.c  Only in
>>> php-8.3.4/ext/json: json_parser.tab.h
>>> Only in php-8.3.4/ext/json: json_scanner.c
>>> Only in php-8.3.4/ext/json:
>>> php_json_scanner_defs.hOnly in php-8.3.4/ext/pdo:
>>> pdo_sql_parser.c
>>> Only in php-8.3.4/ext/phar:
>>> phar_path_check.c  Only in
>>> php-8.3.4/ext/standard: url_scanner_ex.c
>>> Only in php-8.3.4/ext/standard: var_unserializer.c
>>> Only in php-8.3.4/main: php_config.h.in
>>> Files php-8.3.4/main/php_version.h and php-src/main/php_version.h
>>> differ   Only in php-8.3.4/pear:
>>> install-pear-nozlib.phar   Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_lexer.c  Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_parser.h
>>> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
>>> ```
>>>
>>> To prevent attacks from malevolent/compromised RMs, I propose completely
>>> removing all autogenerated files from the release tarballs, and ensuring
>>> their content exactly matches the content of the associated git tag (this
>>> means also removing the -dev prefix from the version number in
>>> main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).
>>>
>>> Of course this means that users will have to generate the build scripts
>>> when compiling PHP, as when installing PHP from the VCS repo.
>>>
>>> I'm sending a copy of this email to secur...@php.net as well.
>>>
>>> Hey Daniil,
>>>
>>> You can also have a public CI (i.e. a github action) generate the
>>> artifacts, along with hash computation.
>>> It should be a github action which runs on tags. This makes it fully
>>> verifiable; i.e. the code for the generation of action, including the hash.
>>> Anyone who wants can trivially trace this back.
>>>
>>> There's nothing in the tarballs which cannot be trivially automated and
>>> made verifiable.
>>>
>>> I don't think providing pre-generated files is fundamentally flawed, the
>>> primary lacking thing is 

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Jakub Zelenka
Hi,

On Sat, Mar 30, 2024 at 3:33 PM Daniil Gentili 
wrote:

> It is also pretty standard thing to distribute configure files (which is
> the file that probably matters most).
>
> The current standard way of distributing generated configure files in
> tarballs is precisely what allowed the xz supply chain attack to go
> unnoticed for so long.
>
>
Do you think it would be different if the change happened in the
distributed source file instead? I mean you could still modify tarball of
the distributed file (e.g. hide somewhere in configure.ac or in our case
more easily in less visible files like various Makefile.frag and similar).
The only thing that you get by using just VCS files is that people could
hash the distributed content of the files and compare it with the hash of
the VCS files but does anyone do this sort of verification?

If you meant using Git archive, then it's not a good idea because it
doesn't have a long term hash stability:
https://github.com/orgs/community/discussions/46034 .


> I strongly believe all projects using autotools, including PHP, should
> switch away from this "standard" way of doing things.
>
> Also don't forget that we need to also provide Windows builds which are
> binaries so we need some sort of verification of this type in any case.
>
> Of course, build reproducibility is a very good thing, but when a user
> downloads a binary, they're aware that they're getting a compiled blob
> which might contain injected malicious code (especially if there's no build
> reproducibility); when a user downloads a source tarball, there's a false
> sense of security rooted in the mistaken belief that the source code in the
> tarball matches the one distributed in the VCS, but in reality, the tarball
> also contains potentially malicious semi-compiled blobs, not present in the
> VCS.
>
>
> It would require compromising the CI as well as the download serves
> happening at the same time which seems to me like an impossible scenario.
>
> I misunderstood your original message, I thought you meant that there
> would be some new CI system hosted on downloads.php.net dedicated to
> verifying, not the current GHA CI system, which is configured on the public
> VCS.
>
> If GHA is used for verifying builds, that would make more sense, but then
> users would be required to check the status of a github pipeline to
> validate that the tarball was not compromised (or alternatively clone from
> source and re-generate the build scripts manually, or simply trust the
> release manager, which brings us to square 1).
>

As noted above, you would need to do much more involved verification if
only generated source codes were removed. With GHA it would be just failed
build and we could integrate notification (e.g. to Foundation Slack) so
more people could be easily aware if there is such problem.

Regards

Jakub


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Ben Ramsey
On Mar 30, 2024, at 07:03, Jakub Zelenka  wrote:Hi,On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta  wrote:On Sat, 30 Mar 2024, 05:19 Ben Ramsey,  wrote:On Mar 29, 2024, at 20:20, Bob Weinand  wrote:

  
  
On 29.3.2024 23:31:26, Daniil Gentili
  wrote:


  
  
  In light of
the recent supply chain attack in xz/lzma, leading to a backdoor
in openSSH
(https://www.openwall.com/lists/oss-security/2024/03/29/4), I
believe that it would be a good idea to remove the huge attack
surface offered by the pre-generated autoconf build scripts and
lexers, offered in the release tarballs. 
  
  In
particular, the xz supply chain attack injected the exploit with
a few obfuscated lines, manually added to the end of the
pre-generated configure script, that was only bundled in the
tarballs. 
  
  Even if the
exploits themselves were committed to the repo in the form of
test files, the code that actually injected the exploit in the
library was not committed to the repo, and was only present in
the pre-generated configure script in the tarball: this
injection mode makes sense, as extra files in the tarball not
present in the git repo would raise suspicions, but
machine-generated configure scripts containing hundreds of
thousands of lines of code not present in the upstream VCS are
the norm, and are usually not checked before execution. 
  
  Specifically
in the case of PHP, along from the configure script, the tarball
also bundles generated lexer files which contain actual C code,
which is an additional attack vector, i.e. here's the diff
between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag
on the git repo: 
  
  ```
  
  ~ $ diff -r
php-8.3.4 php-src -q 
  Only in
php-src:
.git  Files
php-8.3.4/NEWS and php-src/NEWS
differ   Files php-8.3.4/Zend/zend.h
and php-src/Zend/zend.h differ Only in
php-8.3.4/Zend: zend_ini_parser.c 
  Only in
php-8.3.4/Zend: zend_ini_parser.h 
  Only in
php-8.3.4/Zend:
zend_ini_parser.output Only in
php-8.3.4/Zend: zend_ini_scanner.c 
  Only in
php-8.3.4/Zend: zend_ini_scanner_defs.h 
  Only in
php-8.3.4/Zend:
zend_language_parser.c Only in
php-8.3.4/Zend:
zend_language_parser.h Only in
php-8.3.4/Zend: zend_language_parser.output 
  Only in
php-8.3.4/Zend: zend_language_scanner.c 
  Only in
php-8.3.4/Zend:
zend_language_scanner_defs.h   Only in
php-8.3.4:
configure   Files
php-8.3.4/configure.ac and php-src/configure.ac
differ   Only in php-8.3.4/ext/json:
json_parser.tab.c  Only in
php-8.3.4/ext/json: json_parser.tab.h 
  Only in
php-8.3.4/ext/json: json_scanner.c 
  Only in
php-8.3.4/ext/json:
php_json_scanner_defs.h    Only in
php-8.3.4/ext/pdo: pdo_sql_parser.c 
  Only in
php-8.3.4/ext/phar:
phar_path_check.c  Only in
php-8.3.4/ext/standard: url_scanner_ex.c 
  Only in
php-8.3.4/ext/standard: var_unserializer.c 
  Only in
php-8.3.4/main: php_config.h.in 
  Files
php-8.3.4/main/php_version.h and php-src/main/php_version.h
differ   Only in php-8.3.4/pear:
install-pear-nozlib.phar   Only in
php-8.3.4/sapi/phpdbg:
phpdbg_lexer.c  Only in
php-8.3.4/sapi/phpdbg:
phpdbg_parser.c Only in
php-8.3.4/sapi/phpdbg: phpdbg_parser.h 
  Only in
php-8.3.4/sapi/phpdbg: phpdbg_parser.output 
  ```
  
  
  To prevent
attacks from malevolent/compromised RMs, I propose completely
removing all autogenerated files from the release tarballs, and
ensuring their content exactly matches the content of the
associated git tag (this means also removing the -dev prefix
from the version number in main/php_version.h, Zend/zend.h,
configure.ac and NEWS in the git tag). 
  
  Of course
this means that users will have to generate the build scripts
when compiling PHP, as when installing PHP from the VCS repo.
  
  
  I'm sending
a copy of this email to secur...@php.net as well.
Hey Daniil,

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Daniil Gentili


That would break lots of tools as it requires extra dependencies so it 
is not something that would could in stable versions.


Btw, I do not believe that "it would require end users to install 
autotools and bison in order to compile PHP from tarballs" is valid 
reason to delay the patching of a serious attack vector ASAP.



Regards,

Daniil Gentili.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Daniil Gentili

Hi,
It is also pretty standard thing to distribute configure files (which 
is the file that probably matters most).


The current standard way of distributing generated configure files in 
tarballs is precisely what allowed the xz supply chain attack to go 
unnoticed for so long.


I strongly believe all projects using autotools, including PHP, should 
switch away from this "standard" way of doing things.


Also don't forget that we need to also provide Windows builds which 
are binaries so we need some sort of verification of this type in any 
case.


Of course, build reproducibility is a very good thing, but when a user 
downloads a binary, they're aware that they're getting a compiled blob 
which might contain injected malicious code (especially if there's no 
build reproducibility); when a user downloads a source tarball, there's 
a false sense of security rooted in the mistaken belief that the source 
code in the tarball matches the one distributed in the VCS, but in 
reality, the tarball also contains potentially malicious semi-compiled 
blobs, not present in the VCS.


It would require compromising the CI as well as the download serves 
happening at the same time which seems to me like an impossible scenario.


I misunderstood your original message, I thought you meant that there 
would be some new CI system hosted on downloads.php.net dedicated to 
verifying, not the current GHA CI system, which is configured on the 
public VCS.


If GHA is used for verifying builds, that would make more sense, but 
then users would be required to check the status of a github pipeline to 
validate that the tarball was not compromised (or alternatively clone 
from source and re-generate the build scripts manually, or simply trust 
the release manager, which brings us to square 1).



Regards,

Daniil Gentili.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Jakub Zelenka
On Sat, Mar 30, 2024 at 12:03 PM Jakub Zelenka  wrote:

> Hi,
>
> On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta  wrote:
>
>>
>>
>> On Sat, 30 Mar 2024, 05:19 Ben Ramsey,  wrote:
>>
>>> On Mar 29, 2024, at 20:20, Bob Weinand  wrote:
>>>
>>> 
>>> On 29.3.2024 23:31:26, Daniil Gentili wrote:
>>>
>>> In light of the recent supply chain attack in xz/lzma, leading to a
>>> backdoor in openSSH (
>>> https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe
>>> that it would be a good idea to remove the huge attack surface offered by
>>> the pre-generated autoconf build scripts and lexers, offered in the release
>>> tarballs.
>>>
>>> In particular, the xz supply chain attack injected the exploit with a
>>> few obfuscated lines, manually added to the end of the pre-generated
>>> configure script, that was only bundled in the tarballs.
>>>
>>> Even if the exploits themselves were committed to the repo in the form
>>> of test files, the code that actually injected the exploit in the library
>>> was not committed to the repo, and was only present in the pre-generated
>>> configure script in the tarball: this injection mode makes sense, as extra
>>> files in the tarball not present in the git repo would raise suspicions,
>>> but machine-generated configure scripts containing hundreds of thousands of
>>> lines of code not present in the upstream VCS are the norm, and are usually
>>> not checked before execution.
>>>
>>> Specifically in the case of PHP, along from the configure script, the
>>> tarball also bundles generated lexer files which contain actual C code,
>>> which is an additional attack vector, i.e. here's the diff between the
>>> tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
>>>
>>> ```
>>> ~ $ diff -r php-8.3.4 php-src -q
>>> Only in php-src:
>>> .git  Files
>>> php-8.3.4/NEWS and php-src/NEWS differ   Files
>>> php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only
>>> in php-8.3.4/Zend: zend_ini_parser.c
>>> Only in php-8.3.4/Zend: zend_ini_parser.h
>>> Only in php-8.3.4/Zend:
>>> zend_ini_parser.output Only in php-8.3.4/Zend:
>>> zend_ini_scanner.c
>>> Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
>>> Only in php-8.3.4/Zend:
>>> zend_language_parser.c Only in php-8.3.4/Zend:
>>> zend_language_parser.h Only in php-8.3.4/Zend:
>>> zend_language_parser.output
>>> Only in php-8.3.4/Zend: zend_language_scanner.c
>>> Only in php-8.3.4/Zend:
>>> zend_language_scanner_defs.h   Only in php-8.3.4:
>>> configure   Files php-8.3.4/
>>> configure.ac and php-src/configure.ac differ   Only in
>>> php-8.3.4/ext/json: json_parser.tab.c  Only in
>>> php-8.3.4/ext/json: json_parser.tab.h
>>> Only in php-8.3.4/ext/json: json_scanner.c
>>> Only in php-8.3.4/ext/json:
>>> php_json_scanner_defs.hOnly in php-8.3.4/ext/pdo:
>>> pdo_sql_parser.c
>>> Only in php-8.3.4/ext/phar:
>>> phar_path_check.c  Only in
>>> php-8.3.4/ext/standard: url_scanner_ex.c
>>> Only in php-8.3.4/ext/standard: var_unserializer.c
>>> Only in php-8.3.4/main: php_config.h.in
>>> Files php-8.3.4/main/php_version.h and php-src/main/php_version.h
>>> differ   Only in php-8.3.4/pear:
>>> install-pear-nozlib.phar   Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_lexer.c  Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in
>>> php-8.3.4/sapi/phpdbg: phpdbg_parser.h
>>> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
>>> ```
>>>
>>> To prevent attacks from malevolent/compromised RMs, I propose completely
>>> removing all autogenerated files from the release tarballs, and ensuring
>>> their content exactly matches the content of the associated git tag (this
>>> means also removing the -dev prefix from the version number in
>>> main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).
>>>
>>> Of course this means that users will have to generate the build scripts
>>> when compiling PHP, as when installing PHP from the VCS repo.
>>>
>>> I'm sending a copy of this email to secur...@php.net as well.
>>>
>>> Hey Daniil,
>>>
>>> You can also have a public CI (i.e. a github action) generate the
>>> artifacts, along with hash computation.
>>> It should be a github action which runs on tags. This makes it fully
>>> verifiable; i.e. the code for the generation of action, including the hash.
>>> Anyone who wants can trivially trace this back.
>>>
>>> There's nothing in the tarballs which cannot be trivially automated and
>>> made verifiable.
>>>
>>> I don't think providing pre-generated files is fundamentally flawed, the
>>> primary lacking thing is verifiability. Which is also what enabled the xz
>>> 

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Jakub Zelenka
Hi

On Sat, Mar 30, 2024 at 1:39 PM Daniil Gentili 
wrote:

> Hi,
>
>
> >The idea is that we would setup worklfow on CI that would run on tag push
> and it would call (authenticated https request) downloads.php.net server
> that could do the actual build
>
> I strongly believe that source tarballs should contain *only* the source
> code contained in the VCS.
>
That would break lots of tools as it requires extra dependencies so it is
not something that would could in stable versions. It is also pretty
standard thing to distribute configure files (which is the file that
probably matters most). Also don't forget that we need to also provide
Windows builds which are binaries so we need some sort of verification of
this type in any case.

>
> Distributing "half-built" source code (even if it's generated by a CI, and
> especially by a build server on downloads.php.net, which can be
> compromised) defeats the reproducibility and transparency purposes of
> building from source.
>

It would require compromising the CI as well as the download serves
happening at the same time which seems to me like an impossible scenario.

Regards

Jakub


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Tim Düsterhus

Hi

On 3/30/24 14:20, Stanislav Malyshev wrote:

But does the release manager generate the files (and the tarball) in a
reproducible way?


I understand that's what ./scripts/dev/makedist and
./scripts/dev/genfiles do, but I suspect exact bits in resulting
configure and lexers may depend on the exact version of tools & utils
used. For upstream packagers like distros I'd likely recommend using
these tools directly anyway, and not rely on what's in the package.



I've made some improvements to the 'makedist' script last year to 
improve reproducibility [1], but they are not fully reproducible yet.


Notably the timestamps within the .tar archive are not reproducible yet: 
https://github.com/php/php-src/blob/186465b1ddcf203ddffb5d24bae897508c711586/scripts/dev/makedist#L169-L172


They are set to the time the script is run, but should probably be 
derived from the time of the current commit instead. Likewise the gzip 
call does not have the -n flag and thus also embeds a timestamp into the 
.tar.gz archive.


There are probably further bits that are not reproducible yet.

Best regards
Tim Düsterhus

[1] https://github.com/php/php-src/pull/10613 and 
https://github.com/php/php-src/pull/10615


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Daniil Gentili

Hi,


>The idea is that we would setup worklfow on CI that would run on tag 
push and it would call (authenticated https request) downloads.php.net 
 server that could do the actual build


I strongly believe that source tarballs should contain *only* the source 
code contained in the VCS.


Distributing "half-built" source code (even if it's generated by a CI, 
and especially by a build server on downloads.php.net, which can be 
compromised) defeats the reproducibility and transparency purposes of 
building from source.


> For upstream packagers like distros I'd likely recommend using these 
tools directly anyway, and not rely on what's in the package.


Distros like arch linux already re-generate the configure scripts from 
scratch, but I believe that no distinction should be made, *everyone* 
should get a tarball containing *only* the bare source code, without 
leaving to the user the choice to re-generate the build files, or use a 
potentially compromised build script.



Regards,

Daniil Gentili.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Stanislav Malyshev

Hi!

On 3/30/24 1:27 AM, Sebastian Bergmann wrote:

Am 30.03.2024 um 05:17 schrieb Ben Ramsey:
This is also why our release managers sign the tarballs with their own 
GPG keys, after generating the artifacts. This verifies the release 
manager was the one who generated the files.


But does the release manager generate the files (and the tarball) in a 
reproducible way?


I understand that's what ./scripts/dev/makedist and 
./scripts/dev/genfiles do, but I suspect exact bits in resulting 
configure and lexers may depend on the exact version of tools & utils 
used. For upstream packagers like distros I'd likely recommend using 
these tools directly anyway, and not rely on what's in the package.


--
Stas Malyshev
smalys...@gmail.com


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Jakub Zelenka
Hi,

On Sat, Mar 30, 2024 at 7:08 AM Marco Pivetta  wrote:

>
>
> On Sat, 30 Mar 2024, 05:19 Ben Ramsey,  wrote:
>
>> On Mar 29, 2024, at 20:20, Bob Weinand  wrote:
>>
>> 
>> On 29.3.2024 23:31:26, Daniil Gentili wrote:
>>
>> In light of the recent supply chain attack in xz/lzma, leading to a
>> backdoor in openSSH (
>> https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe
>> that it would be a good idea to remove the huge attack surface offered by
>> the pre-generated autoconf build scripts and lexers, offered in the release
>> tarballs.
>>
>> In particular, the xz supply chain attack injected the exploit with a few
>> obfuscated lines, manually added to the end of the pre-generated configure
>> script, that was only bundled in the tarballs.
>>
>> Even if the exploits themselves were committed to the repo in the form of
>> test files, the code that actually injected the exploit in the library was
>> not committed to the repo, and was only present in the pre-generated
>> configure script in the tarball: this injection mode makes sense, as extra
>> files in the tarball not present in the git repo would raise suspicions,
>> but machine-generated configure scripts containing hundreds of thousands of
>> lines of code not present in the upstream VCS are the norm, and are usually
>> not checked before execution.
>>
>> Specifically in the case of PHP, along from the configure script, the
>> tarball also bundles generated lexer files which contain actual C code,
>> which is an additional attack vector, i.e. here's the diff between the
>> tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
>>
>> ```
>> ~ $ diff -r php-8.3.4 php-src -q
>> Only in php-src:
>> .git  Files
>> php-8.3.4/NEWS and php-src/NEWS differ   Files
>> php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only
>> in php-8.3.4/Zend: zend_ini_parser.c
>> Only in php-8.3.4/Zend: zend_ini_parser.h
>> Only in php-8.3.4/Zend:
>> zend_ini_parser.output Only in php-8.3.4/Zend:
>> zend_ini_scanner.c
>> Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
>> Only in php-8.3.4/Zend:
>> zend_language_parser.c Only in php-8.3.4/Zend:
>> zend_language_parser.h Only in php-8.3.4/Zend:
>> zend_language_parser.output
>> Only in php-8.3.4/Zend: zend_language_scanner.c
>> Only in php-8.3.4/Zend:
>> zend_language_scanner_defs.h   Only in php-8.3.4:
>> configure   Files php-8.3.4/
>> configure.ac and php-src/configure.ac differ   Only in
>> php-8.3.4/ext/json: json_parser.tab.c  Only in
>> php-8.3.4/ext/json: json_parser.tab.h
>> Only in php-8.3.4/ext/json: json_scanner.c
>> Only in php-8.3.4/ext/json:
>> php_json_scanner_defs.hOnly in php-8.3.4/ext/pdo:
>> pdo_sql_parser.c
>> Only in php-8.3.4/ext/phar:
>> phar_path_check.c  Only in
>> php-8.3.4/ext/standard: url_scanner_ex.c
>> Only in php-8.3.4/ext/standard: var_unserializer.c
>> Only in php-8.3.4/main: php_config.h.in
>> Files php-8.3.4/main/php_version.h and php-src/main/php_version.h
>> differ   Only in php-8.3.4/pear:
>> install-pear-nozlib.phar   Only in
>> php-8.3.4/sapi/phpdbg: phpdbg_lexer.c  Only in
>> php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in
>> php-8.3.4/sapi/phpdbg: phpdbg_parser.h
>> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
>> ```
>>
>> To prevent attacks from malevolent/compromised RMs, I propose completely
>> removing all autogenerated files from the release tarballs, and ensuring
>> their content exactly matches the content of the associated git tag (this
>> means also removing the -dev prefix from the version number in
>> main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).
>>
>> Of course this means that users will have to generate the build scripts
>> when compiling PHP, as when installing PHP from the VCS repo.
>>
>> I'm sending a copy of this email to secur...@php.net as well.
>>
>> Hey Daniil,
>>
>> You can also have a public CI (i.e. a github action) generate the
>> artifacts, along with hash computation.
>> It should be a github action which runs on tags. This makes it fully
>> verifiable; i.e. the code for the generation of action, including the hash.
>> Anyone who wants can trivially trace this back.
>>
>> There's nothing in the tarballs which cannot be trivially automated and
>> made verifiable.
>>
>> I don't think providing pre-generated files is fundamentally flawed, the
>> primary lacking thing is verifiability. Which is also what enabled the xz
>> backdoor.
>>
>> Bob
>>
>>
>> This is also why our release managers sign the tarballs with their own
>> GPG keys, after generating the artifacts. This verifies the 

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Sebastian Bergmann

Am 30.03.2024 um 05:17 schrieb Ben Ramsey:
This is also why our release managers sign the tarballs with their own GPG 
keys, after generating the artifacts. This verifies the release manager 
was the one who generated the files.


But does the release manager generate the files (and the tarball) in a 
reproducible way?


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-30 Thread Marco Pivetta
On Sat, 30 Mar 2024, 05:19 Ben Ramsey,  wrote:

> On Mar 29, 2024, at 20:20, Bob Weinand  wrote:
>
> 
> On 29.3.2024 23:31:26, Daniil Gentili wrote:
>
> In light of the recent supply chain attack in xz/lzma, leading to a
> backdoor in openSSH (
> https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe that
> it would be a good idea to remove the huge attack surface offered by the
> pre-generated autoconf build scripts and lexers, offered in the release
> tarballs.
>
> In particular, the xz supply chain attack injected the exploit with a few
> obfuscated lines, manually added to the end of the pre-generated configure
> script, that was only bundled in the tarballs.
>
> Even if the exploits themselves were committed to the repo in the form of
> test files, the code that actually injected the exploit in the library was
> not committed to the repo, and was only present in the pre-generated
> configure script in the tarball: this injection mode makes sense, as extra
> files in the tarball not present in the git repo would raise suspicions,
> but machine-generated configure scripts containing hundreds of thousands of
> lines of code not present in the upstream VCS are the norm, and are usually
> not checked before execution.
>
> Specifically in the case of PHP, along from the configure script, the
> tarball also bundles generated lexer files which contain actual C code,
> which is an additional attack vector, i.e. here's the diff between the
> tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the git repo:
>
> ```
> ~ $ diff -r php-8.3.4 php-src -q
> Only in php-src: .git
> Files php-8.3.4/NEWS and php-src/NEWS differ
> Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ
> Only in php-8.3.4/Zend: zend_ini_parser.c
> Only in php-8.3.4/Zend: zend_ini_parser.h
> Only in php-8.3.4/Zend: zend_ini_parser.output
> Only in php-8.3.4/Zend: zend_ini_scanner.c
> Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
> Only in php-8.3.4/Zend: zend_language_parser.c
> Only in php-8.3.4/Zend: zend_language_parser.h
> Only in php-8.3.4/Zend: zend_language_parser.output
> Only in php-8.3.4/Zend: zend_language_scanner.c
> Only in php-8.3.4/Zend: zend_language_scanner_defs.h
> Only in php-8.3.4: configure
> Files php-8.3.4/configure.ac and php-src/configure.ac
> differ   Only in php-8.3.4/ext/json:
> json_parser.tab.c  Only in php-8.3.4/ext/json:
> json_parser.tab.h
> Only in php-8.3.4/ext/json: json_scanner.c
> Only in php-8.3.4/ext/json: php_json_scanner_defs.h
> Only in php-8.3.4/ext/pdo: pdo_sql_parser.c
> Only in php-8.3.4/ext/phar: phar_path_check.c
> Only in php-8.3.4/ext/standard: url_scanner_ex.c
> Only in php-8.3.4/ext/standard: var_unserializer.c
> Only in php-8.3.4/main: php_config.h.in
> Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ
> Only in php-8.3.4/pear: install-pear-nozlib.phar
> Only in php-8.3.4/sapi/phpdbg: phpdbg_lexer.c
> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.c
> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.h
> Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
> ```
>
> To prevent attacks from malevolent/compromised RMs, I propose completely
> removing all autogenerated files from the release tarballs, and ensuring
> their content exactly matches the content of the associated git tag (this
> means also removing the -dev prefix from the version number in
> main/php_version.h, Zend/zend.h, configure.ac and NEWS in the git tag).
>
> Of course this means that users will have to generate the build scripts
> when compiling PHP, as when installing PHP from the VCS repo.
>
> I'm sending a copy of this email to secur...@php.net as well.
>
> Hey Daniil,
>
> You can also have a public CI (i.e. a github action) generate the
> artifacts, along with hash computation.
> It should be a github action which runs on tags. This makes it fully
> verifiable; i.e. the code for the generation of action, including the hash.
> Anyone who wants can trivially trace this back.
>
> There's nothing in the tarballs which cannot be trivially automated and
> made verifiable.
>
> I don't think providing pre-generated files is fundamentally flawed, the
> primary lacking thing is verifiability. Which is also what enabled the xz
> backdoor.
>
> Bob
>
>
> This is also why our release managers sign the tarballs with their own GPG
> keys, after generating the artifacts. This verifies the release manager was
> the one who generated the files.
>
> Cheers,
> Ben
>

Hey Ben,

I understand that the XZ project had signed releases too: that still means
that downstream consumers would need to trust the release managers anyway,
and reproduce the whole chain themselves.

I suppose that's part of OP's concern.


Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-29 Thread Ben Ramsey
On Mar 29, 2024, at 20:20, Bob Weinand  wrote:

  
  
On 29.3.2024 23:31:26, Daniil Gentili
  wrote:


  
  
  In light of
the recent supply chain attack in xz/lzma, leading to a backdoor
in openSSH
(https://www.openwall.com/lists/oss-security/2024/03/29/4), I
believe that it would be a good idea to remove the huge attack
surface offered by the pre-generated autoconf build scripts and
lexers, offered in the release tarballs. 
  
  In
particular, the xz supply chain attack injected the exploit with
a few obfuscated lines, manually added to the end of the
pre-generated configure script, that was only bundled in the
tarballs. 
  
  Even if the
exploits themselves were committed to the repo in the form of
test files, the code that actually injected the exploit in the
library was not committed to the repo, and was only present in
the pre-generated configure script in the tarball: this
injection mode makes sense, as extra files in the tarball not
present in the git repo would raise suspicions, but
machine-generated configure scripts containing hundreds of
thousands of lines of code not present in the upstream VCS are
the norm, and are usually not checked before execution. 
  
  Specifically
in the case of PHP, along from the configure script, the tarball
also bundles generated lexer files which contain actual C code,
which is an additional attack vector, i.e. here's the diff
between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag
on the git repo: 
  
  ```
  
  ~ $ diff -r
php-8.3.4 php-src -q 
  Only in
php-src:
.git  Files
php-8.3.4/NEWS and php-src/NEWS
differ   Files php-8.3.4/Zend/zend.h
and php-src/Zend/zend.h differ Only in
php-8.3.4/Zend: zend_ini_parser.c 
  Only in
php-8.3.4/Zend: zend_ini_parser.h 
  Only in
php-8.3.4/Zend:
zend_ini_parser.output Only in
php-8.3.4/Zend: zend_ini_scanner.c 
  Only in
php-8.3.4/Zend: zend_ini_scanner_defs.h 
  Only in
php-8.3.4/Zend:
zend_language_parser.c Only in
php-8.3.4/Zend:
zend_language_parser.h Only in
php-8.3.4/Zend: zend_language_parser.output 
  Only in
php-8.3.4/Zend: zend_language_scanner.c 
  Only in
php-8.3.4/Zend:
zend_language_scanner_defs.h   Only in
php-8.3.4:
configure   Files
php-8.3.4/configure.ac and php-src/configure.ac
differ   Only in php-8.3.4/ext/json:
json_parser.tab.c  Only in
php-8.3.4/ext/json: json_parser.tab.h 
  Only in
php-8.3.4/ext/json: json_scanner.c 
  Only in
php-8.3.4/ext/json:
php_json_scanner_defs.h    Only in
php-8.3.4/ext/pdo: pdo_sql_parser.c 
  Only in
php-8.3.4/ext/phar:
phar_path_check.c  Only in
php-8.3.4/ext/standard: url_scanner_ex.c 
  Only in
php-8.3.4/ext/standard: var_unserializer.c 
  Only in
php-8.3.4/main: php_config.h.in 
  Files
php-8.3.4/main/php_version.h and php-src/main/php_version.h
differ   Only in php-8.3.4/pear:
install-pear-nozlib.phar   Only in
php-8.3.4/sapi/phpdbg:
phpdbg_lexer.c  Only in
php-8.3.4/sapi/phpdbg:
phpdbg_parser.c Only in
php-8.3.4/sapi/phpdbg: phpdbg_parser.h 
  Only in
php-8.3.4/sapi/phpdbg: phpdbg_parser.output 
  ```
  
  
  To prevent
attacks from malevolent/compromised RMs, I propose completely
removing all autogenerated files from the release tarballs, and
ensuring their content exactly matches the content of the
associated git tag (this means also removing the -dev prefix
from the version number in main/php_version.h, Zend/zend.h,
configure.ac and NEWS in the git tag). 
  
  Of course
this means that users will have to generate the build scripts
when compiling PHP, as when installing PHP from the VCS repo.
  
  
  I'm sending
a copy of this email to secur...@php.net as well.
Hey Daniil,
You can also have a public CI (i.e. a github action) generate the
  artifacts, along with hash computation.
  It should be a github action which runs on tags. This makes it
  fully verifiable; i.e. 

Re: [PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-29 Thread Bob Weinand

On 29.3.2024 23:31:26, Daniil Gentili wrote:
In light of the recent supply chain attack in xz/lzma, leading to a 
backdoor in openSSH 
(https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe 
that it would be a good idea to remove the huge attack surface offered 
by the pre-generated autoconf build scripts and lexers, offered in the 
release tarballs.


In particular, the xz supply chain attack injected the exploit with a 
few obfuscated lines, manually added to the end of the pre-generated 
configure script, that was only bundled in the tarballs.


Even if the exploits themselves were committed to the repo in the form 
of test files, the code that actually injected the exploit in the 
library was not committed to the repo, and was only present in the 
pre-generated configure script in the tarball: this injection mode 
makes sense, as extra files in the tarball not present in the git repo 
would raise suspicions, but machine-generated configure scripts 
containing hundreds of thousands of lines of code not present in the 
upstream VCS are the norm, and are usually not checked before execution.


Specifically in the case of PHP, along from the configure script, the 
tarball also bundles generated lexer files which contain actual C 
code, which is an additional attack vector, i.e. here's the diff 
between the tarball of the 8.3.4 release, and the PHP-8.3.4 tag on the 
git repo:


```
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: 
.git  Files 
php-8.3.4/NEWS and php-src/NEWS differ   
Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h 
differ Only in php-8.3.4/Zend: zend_ini_parser.c

Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: 
zend_ini_parser.output Only in 
php-8.3.4/Zend: zend_ini_scanner.c

Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: 
zend_language_parser.c Only in 
php-8.3.4/Zend: zend_language_parser.h 
Only in php-8.3.4/Zend: zend_language_parser.output

Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: 
zend_language_scanner_defs.h   Only in php-8.3.4: 
configure   Files 
php-8.3.4/configure.ac and php-src/configure.ac differ   
Only in php-8.3.4/ext/json: 
json_parser.tab.c  Only in 
php-8.3.4/ext/json: json_parser.tab.h

Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: 
php_json_scanner_defs.h    Only in 
php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: 
phar_path_check.c  Only in 
php-8.3.4/ext/standard: url_scanner_ex.c

Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: php_config.h.in
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h 
differ   Only in php-8.3.4/pear: 
install-pear-nozlib.phar   Only in 
php-8.3.4/sapi/phpdbg: phpdbg_lexer.c  
Only in php-8.3.4/sapi/phpdbg: 
phpdbg_parser.c Only in 
php-8.3.4/sapi/phpdbg: phpdbg_parser.h

Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
```

To prevent attacks from malevolent/compromised RMs, I propose 
completely removing all autogenerated files from the release tarballs, 
and ensuring their content exactly matches the content of the 
associated git tag (this means also removing the -dev prefix from the 
version number in main/php_version.h, Zend/zend.h, configure.ac and 
NEWS in the git tag).


Of course this means that users will have to generate the build 
scripts when compiling PHP, as when installing PHP from the VCS repo.


I'm sending a copy of this email to secur...@php.net as well.


Hey Daniil,

You can also have a public CI (i.e. a github action) generate the 
artifacts, along with hash computation.
It should be a github action which runs on tags. This makes it fully 
verifiable; i.e. the code for the generation of action, including the 
hash. Anyone who wants can trivially trace this back.


There's nothing in the tarballs which cannot be trivially automated and 
made verifiable.


I don't think providing pre-generated files is fundamentally flawed, the 
primary lacking thing is verifiability. Which is also what enabled the 
xz backdoor.


Bob


[PHP-DEV] Consider removing autogenerated files from tarballs

2024-03-29 Thread Daniil Gentili
In light of the recent supply chain attack in xz/lzma, leading to a backdoor in 
openSSH (https://www.openwall.com/lists/oss-security/2024/03/29/4), I believe 
that it would be a good idea to remove the huge attack surface offered by the 
pre-generated autoconf build scripts and lexers, offered in the release 
tarballs.

In particular, the xz supply chain attack injected the exploit with a few 
obfuscated lines, manually added to the end of the pre-generated configure 
script, that was only bundled in the tarballs.

Even if the exploits themselves were committed to the repo in the form of test 
files, the code that actually injected the exploit in the library was not 
committed to the repo, and was only present in the pre-generated configure 
script in the tarball: this injection mode makes sense, as extra files in the 
tarball not present in the git repo would raise suspicions, but 
machine-generated configure scripts containing hundreds of thousands of lines 
of code not present in the upstream VCS are the norm, and are usually not 
checked before execution.

Specifically in the case of PHP, along from the configure script, the tarball 
also bundles generated lexer files which contain actual C code, which is an 
additional attack vector, i.e. here's the diff between the tarball of the 8.3.4 
release, and the PHP-8.3.4 tag on the git repo:

```
~ $ diff -r php-8.3.4 php-src -q
Only in php-src: .git  
Files php-8.3.4/NEWS and php-src/NEWS differ   
Files php-8.3.4/Zend/zend.h and php-src/Zend/zend.h differ Only 
in php-8.3.4/Zend: zend_ini_parser.c
Only in php-8.3.4/Zend: zend_ini_parser.h
Only in php-8.3.4/Zend: zend_ini_parser.output Only 
in php-8.3.4/Zend: zend_ini_scanner.c
Only in php-8.3.4/Zend: zend_ini_scanner_defs.h
Only in php-8.3.4/Zend: zend_language_parser.c Only 
in php-8.3.4/Zend: zend_language_parser.h Only in 
php-8.3.4/Zend: zend_language_parser.output
Only in php-8.3.4/Zend: zend_language_scanner.c
Only in php-8.3.4/Zend: zend_language_scanner_defs.h   Only 
in php-8.3.4: configure   Files 
php-8.3.4/configure.ac and php-src/configure.ac differ   Only in 
php-8.3.4/ext/json: json_parser.tab.c  Only in 
php-8.3.4/ext/json: json_parser.tab.h
Only in php-8.3.4/ext/json: json_scanner.c
Only in php-8.3.4/ext/json: php_json_scanner_defs.h    Only 
in php-8.3.4/ext/pdo: pdo_sql_parser.c
Only in php-8.3.4/ext/phar: phar_path_check.c  Only 
in php-8.3.4/ext/standard: url_scanner_ex.c
Only in php-8.3.4/ext/standard: var_unserializer.c
Only in php-8.3.4/main: php_config.h.in
Files php-8.3.4/main/php_version.h and php-src/main/php_version.h differ   Only 
in php-8.3.4/pear: install-pear-nozlib.phar   Only in 
php-8.3.4/sapi/phpdbg: phpdbg_lexer.c  Only in 
php-8.3.4/sapi/phpdbg: phpdbg_parser.c Only in 
php-8.3.4/sapi/phpdbg: phpdbg_parser.h
Only in php-8.3.4/sapi/phpdbg: phpdbg_parser.output
```

To prevent attacks from malevolent/compromised RMs, I propose completely 
removing all autogenerated files from the release tarballs, and ensuring their 
content exactly matches the content of the associated git tag (this means also 
removing the -dev prefix from the version number in main/php_version.h, 
Zend/zend.h, configure.ac and NEWS in the git tag).

Of course this means that users will have to generate the build scripts when 
compiling PHP, as when installing PHP from the VCS repo.

I'm sending a copy of this email to secur...@php.net as well.