Re: Building packages three times in a row
[EMAIL PROTECTED] writes: On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote: Some tools use randomization to get out of worst case situations or general optimization. For example when you look for an optimal allocation of register usage you can do a search by picking a random register allocation and repeat that a few thousand times to find a suitable minimum. Or a randomized heap that gives you O(1) time for all operations instead of O(lg n). By requiring bit-to-bit identical results you eliminate all such randomness and could seriously hinder the algorithm available for tools. While I have a hard time understanding why true randomness is required to solve such problems, I have no problem accepting that practically this is a big obstacle. You could probably fix the seed for the random number generator and thereby make the build fixed. But then some joker would construct a source code that would run into the worst case with specifically that seed. It will not be trivial to establish the equivalence of two such pieces of object code. Thank you for sharing this insight. I think you have helped me to finally see the problem that others have pointed to. Taking etch as the example, could you give any idea what percentage of package we might expect this to turn up in ? I think was doing some randomness with register allocation in some cases. But I'm not sure if that is still the case with all the changes gcc had over the years. But if then you could get for example r0 and r1 interchanged in a function causing a change in all opcaodes involving them. In any case it sounds very much like this idea would be much harder to do than I had originally hoped :-( It will be impossible if you don't excude a lot of known timestamps. Regards, Paddy MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, 24 Sep 2007 02:13:32 +0200 Martin Uecker [EMAIL PROTECTED] wrote: The idea is not to replace hashes by bit-by-bit comparison, but to be able to *independendly* reproduce binaries from source code in a bit-identical way. And what is going to happen when I used gcc-4.2.2007foo and you use gcc-4.2.1 etc.? You have the .orig.tar.gz and you have the .diff.gz. The standard method is to compare the .orig.tar.gz and then use 'interdiff -z' against the new .diff.gz. Then third parties can recreate the binaries and publish recreated hashes. Why? I see no benefit. If the recreated hashes are identical then you can be sure that nobody has tempered with the build process You'll *only* get that if the build tools are identical - that isn't tampering, it is bug fixing. gcc is not bug-free, each new version can include new bugs or regressions - same applies to autotools, dpkg, etc.etc. and the binary is actually created from the unmodified sources. == compare the .orig.tar.gz - nothing else is needed for that and all the current tools already handle this portion. The current scheme just protects against tempering after signing. That is actually not very much. You have to trust a DD at some point. If you can't trust me to build packages properly, you'll just have to rebuild the entire archive yourself. -- Neil Williams = http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/ pgpGxU8SODj6V.pgp Description: PGP signature
Re: Building packages three times in a row
On 23/09/07 at 23:32 +0200, Martin Uecker wrote: Patrick Winnertz wrote: Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau: Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? We checked not for files which differ, but only for files which are missing in the first package. or which are missing in the second package. I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. We are currently very far from that. If you want to go that direction, you have to find a several-steps process that would make us go there. I compared the result of a one build, with the result of a package built three times, using debdiff. This has several flaws: - it only compared the list of files. If the same files are there, but with totally different size, it won't notice. - it didn't compare with what is in the archive: packages in the archive might be totally different, because they were built at a different time (with a different toolchain), or in a dirty environment. Basically, the goal you should aim at is rebuilding a package should generate binary packages similar enough to what's already in the archive. Raphael's dpkg-shlibdeps work should also help with that, but it doesn't seem like #430367 has progressed recently? -- | Lucas Nussbaum | [EMAIL PROTECTED] http://www.lucas-nussbaum.net/ | | jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F | signature.asc Description: Digital signature
Re: Building packages three times in a row
Martin Uecker [EMAIL PROTECTED] writes: Patrick Winnertz wrote: Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau: Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? We checked not for files which differ, but only for files which are missing in the first package. or which are missing in the second package. I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. Greetings, Martin Some tools use randomization to get out of worst case situations or general optimization. For example when you look for an optimal allocation of register usage you can do a search by picking a random register allocation and repeat that a few thousand times to find a suitable minimum. Or a randomized heap that gives you O(1) time for all operations instead of O(lg n). By requiring bit-to-bit identical results you eliminate all such randomness and could seriously hinder the algorithm available for tools. Plus any bugfix in a tool will likely break it anyway as mentioned in other mails. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote: Some tools use randomization to get out of worst case situations or general optimization. For example when you look for an optimal allocation of register usage you can do a search by picking a random register allocation and repeat that a few thousand times to find a suitable minimum. Or a randomized heap that gives you O(1) time for all operations instead of O(lg n). By requiring bit-to-bit identical results you eliminate all such randomness and could seriously hinder the algorithm available for tools. While I have a hard time understanding why true randomness is required to solve such problems, I have no problem accepting that practically this is a big obstacle. It will not be trivial to establish the equivalence of two such pieces of object code. Thank you for sharing this insight. I think you have helped me to finally see the problem that others have pointed to. Taking etch as the example, could you give any idea what percentage of package we might expect this to turn up in ? In any case it sounds very much like this idea would be much harder to do than I had originally hoped :-( Regards, Paddy -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote: Martin Uecker [EMAIL PROTECTED] writes: I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. Greetings, Martin Some tools use randomization to get out of worst case situations or general optimization. For example when you look for an optimal allocation of register usage you can do a search by picking a random register allocation and repeat that a few thousand times to find a suitable minimum. Or a randomized heap that gives you O(1) time for all operations instead of O(lg n). I do not know of any compiler which does register allocation like that. This would be a debugging nightmare! By requiring bit-to-bit identical results you eliminate all such randomness and could seriously hinder the algorithm available for tools. Such algorithms would certainly use a pseudo-random number generator which could be seeded identically in each build. Do you actually know about any tool wich produces different output each time because of the use of a randomized algorithm? Martin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
Hallo Goswin, Goswin von Brederlow [EMAIL PROTECTED] wrote: Some tools use randomization to get out of worst case situations or general optimization. For example when you look for an optimal allocation of register usage you can do a search by picking a random register allocation and repeat that a few thousand times to find a suitable minimum. Or a randomized heap that gives you O(1) time for all operations instead of O(lg n). Do you have a name of such a tool? gcc? Jörg. -- Professor: ‚Gott‘, unverständliches und mythisches Wesen, das sich einmal pro Woche im Kreis der Sterblichen manifestiert um Weisheit auf Folien unter das Volk zu bringen.(Dschungelbuch 11, FSU Jena) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
Patrick Winnertz wrote: Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau: Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? We checked not for files which differ, but only for files which are missing in the first package. or which are missing in the second package. I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. Greetings, Martin signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: Building packages three times in a row
On Sun, 23 Sep 2007 23:32:59 +0200 Martin Uecker [EMAIL PROTECTED] wrote: Patrick Winnertz wrote: Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau: Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? We checked not for files which differ, but only for files which are missing in the first package. or which are missing in the second package. I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. This has been covered before - certain upstream macros are among many factors that ensure that this is unlikely. I, for one, use such macros upstream to indicate the build time of the actual executable installed so this will change the binary every time it is built. You have md5sums and GnuPG signatures on the Release files - I see no benefit from bit-matching. -- Neil Williams = http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/ pgpKwGcCWCEea.pgp Description: PGP signature
Re: Building packages three times in a row
Neil Williams [EMAIL PROTECTED]: Martin Uecker [EMAIL PROTECTED] wrote: [...] I think it would be really cool if the Debian policy required that packages could be rebuild bit-identical from source. At the moment, it is impossible to independly verify the integricity of binary packages. This has been covered before - certain upstream macros are among many factors that ensure that this is unlikely. I, for one, use such macros upstream to indicate the build time of the actual executable installed so this will change the binary every time it is built. This could be fixed. You have md5sums and GnuPG signatures on the Release files - I see no benefit from bit-matching. The build host could be compromised. Not that unlikely. Martin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, Sep 24, 2007 at 12:54:58AM +0200, Martin Uecker wrote: Neil Williams [EMAIL PROTECTED]: This has been covered before - certain upstream macros are among many factors that ensure that this is unlikely. I, for one, use such macros upstream to indicate the build time of the actual executable installed so this will change the binary every time it is built. This could be fixed. In every binary that includes the build date in it? There's rather a lot; off the top of my head, Vim does it, and so does the Linux kernel AFAIK. You have md5sums and GnuPG signatures on the Release files - I see no benefit from bit-matching. The build host could be compromised. Not that unlikely. And if the build host was compromised, how would that help any more than md5sums and gpg-signing? With access to the build host, whatever list of bits to match could be changed along with the binary, the md5sum, and the gpg-signature. Anyway, surely the point of hashes like md5, sha1, etc, is that it's much faster to do that than to compare large files bit by bit? -- Benjamin A'Lee [EMAIL PROTECTED] http://subvert.org.uk/~bma/ signature.asc Description: Digital signature
Re: Building packages three times in a row
Benjamin A'Lee [EMAIL PROTECTED]: On Mon, Sep 24, 2007 at 12:54:58AM +0200, Martin Uecker wrote: Neil Williams [EMAIL PROTECTED]: This has been covered before - certain upstream macros are among many factors that ensure that this is unlikely. I, for one, use such macros upstream to indicate the build time of the actual executable installed so this will change the binary every time it is built. This could be fixed. In every binary that includes the build date in it? There's rather a lot; off the top of my head, Vim does it, and so does the Linux kernel AFAIK. I know. In a world where providing a correctly working clean target is already an issue, that's pretty far fetched. But IMHO being able to recreate binaries from source code in a reproducable way would be a milestone for security and QA. You have md5sums and GnuPG signatures on the Release files - I see no benefit from bit-matching. The build host could be compromised. Not that unlikely. And if the build host was compromised, how would that help any more than md5sums and gpg-signing? With access to the build host, whatever list of bits to match could be changed along with the binary, the md5sum, and the gpg-signature. Anyway, surely the point of hashes like md5, sha1, etc, is that it's much faster to do that than to compare large files bit by bit? The idea is not to replace hashes by bit-by-bit comparison, but to be able to *independendly* reproduce binaries from source code in a bit-identical way. Then third parties can recreate the binaries and publish recreated hashes. If the recreated hashes are identical then you can be sure that nobody has tempered with the build process and the binary is actually created from the unmodified sources. The current scheme just protects against tempering after signing. That is actually not very much. Martin
Re: Building packages three times in a row
On Tue, Sep 18, 2007 at 20:49:03 +0200, Soeren Sonnenburg wrote: On Mon, 2007-09-10 at 22:34 +0200, Patrick Winnertz wrote: Hi, [...] Furthermore we detect some issues with different package content (compared to the first build) after the second and third build. This bugs will have Severity: Serious. Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? Cheers, Julien -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, 2007-09-10 at 22:34 +0200, Patrick Winnertz wrote: Hi, [...] Furthermore we detect some issues with different package content (compared to the first build) after the second and third build. This bugs will have Severity: Serious. Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Soeren -- Sometimes, there's a moment as you're waking, when you become aware of the real world around you, but you're still dreaming. signature.asc Description: This is a digitally signed message part
Re: Building packages three times in a row
Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau: Hmmhh, what do you do about programs etc that encode the build-time in the binary? I mean they obviously will change between builds? Hopefully they don't encode the build-time in the file list? We checked not for files which differ, but only for files which are missing in the first package. or which are missing in the second package. For example aptitude: In the second build the complete .mo files are missing. Maybe you get an idea if you look on the logs: http://people.debian.org/~lucas/logs/2007/doublebuild-09-05/failed-debdiff/ FYI: I filled all FTBFS bugs on Sunday 16.10. Since this is about 10 days after the rebuilt (I didn't have enough time in between) some packages were already fixed on Sunday. Sorry to the people who fixed this issues with a new upload between the rebuild and my bug reports ;-) Greetings Patrick Cheers, Julien -- .''`. Patrick Winnertz [EMAIL PROTECTED] : :' : GNU/Linux Debian-Edu Developer `. `'` http://www.der-winnie.de http://d.skolelinux.org/~winnie `- Debian - when you have better things to do than fixing systems -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
On Mon, 10 Sep 2007 22:34:52 +0200 Patrick Winnertz [EMAIL PROTECTED] wrote: as a QA effort the whole archive was rebuilt over the weekend to catch build-failures, whether a package can be build three tmes in a row (unpack, build, clean, build,clean, build). What happens about false-positives? No script is perfect - it appears that this script has got it wrong in the case of libgpeschedule at least. This is the second effort to get rid of those issues. The first effort was announced by Martin-Zobel Helas on 16 May 2007 [0]. Something went awry between the two because my packages were fine on the first one, now just one of them is reported to fail in a way that I simply cannot replicate. This must undo any effects that the build and binary targets may have had, except that it should leave alone any output files created in the parent directory by a run of a binary target. AFAICT the clean target is fine. Please note that building a package twice in a row is a release goal for lenny. And libgpeschedule does build two, three or more times in a row. I don't understand why the test routine shows a failure when AFAICT none exists. The build log makes no sense and appears to be incomplete so I have no way of replicating the build and no way to fix it. Unless someone can demonstrate whether something is actually going wrong and give me some ideas on how to fix it, I'm going to have to close 442636 as an artifact of a broken tool. -- Neil Williams = http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/ pgp9PF99Vlh4R.pgp Description: PGP signature
Building packages three times in a row
Hi, as a QA effort the whole archive was rebuilt over the weekend to catch build-failures, whether a package can be build three tmes in a row (unpack, build, clean, build,clean, build). This is the second effort to get rid of those issues. The first effort was announced by Martin-Zobel Helas on 16 May 2007 [0]. We found again about 400 packages not having a sane clean target. To cite http://www.debian.org/doc/debian-policy/ch-source.html#s-debianrules clean This must undo any effects that the build and binary targets may have had, except that it should leave alone any output files created in the parent directory by a run of a binary target. We'll fill bug reports against every package that FTBFS in this way with Severity: Important. Furthermore we detect some issues with different package content (compared to the first build) after the second and third build. This bugs will have Severity: Serious. You'll find after we at the end all filled bugs either here [1] or here [2]. Please note that building a package twice in a row is a release goal for lenny. Greetings Patrick Winnertz [0]:http://lists.debian.org/debian-devel/2007/05/msg00490.html [1]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-doublebuild [2]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-debdiff-differ -- .''`. Patrick Winnertz [EMAIL PROTECTED] : :' : GNU/Linux Debian-Edu Developer `. `'` http://www.der-winnie.de http://d.skolelinux.org/~winnie `- Debian - when you have better things to do than fixing systems -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Building packages three times in a row
Am Montag, 10. September 2007 22:34:52 schrieb Patrick Winnertz: Hi, as a QA effort the whole archive was rebuilt over the weekend to catch build-failures, whether a package can be build three tmes in a row (unpack, build, clean, build,clean, build). This is the second effort to get rid of those issues. The first effort was announced by Martin-Zobel Helas on 16 May 2007 [0]. We found again about 400 packages not having a sane clean target. To cite http://www.debian.org/doc/debian-policy/ch-source.html#s-debianrules clean This must undo any effects that the build and binary targets may have had, except that it should leave alone any output files created in the parent directory by a run of a binary target. We'll fill bug reports against every package that FTBFS in this way with Severity: Important. Furthermore we detect some issues with different package content (compared to the first build) after the second and third build. This bugs will have Severity: Serious. You'll find after we at the end all filled bugs either here [1] or here [2]. Please note that building a package twice in a row is a release goal for lenny. Greetings Patrick Winnertz [0]:http://lists.debian.org/debian-devel/2007/05/msg00490.html [1]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED] ebian.org;tag=qa-doublebuild [2]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED] ebian.org;tag=qa-debdiff-differ Mmpf... I've to correct the third link... there is a small typo. You can see the filled bug reports with a different package content after the build here: http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-debdiff Thanks. Greetings Patrick Winnertz -- .''`. Patrick Winnertz [EMAIL PROTECTED] : :' : GNU/Linux Debian-Edu Developer `. `'` http://www.der-winnie.de http://d.skolelinux.org/~winnie `- Debian - when you have better things to do than fixing systems -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]