Re: Building packages three times in a row

2007-09-25 Thread Goswin von Brederlow
[EMAIL PROTECTED] writes:

 On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote:
 
 Some tools use randomization to get out of worst case situations or
 general optimization. For example when you look for an optimal
 allocation of register usage you can do a search by picking a random
 register allocation and repeat that a few thousand times to find a
 suitable minimum.

 Or a randomized heap that gives you O(1) time for
 all operations instead of O(lg n).
 
 By requiring bit-to-bit identical results you eliminate all such
 randomness and could seriously hinder the algorithm available for
 tools.

 While I have a hard time understanding why true randomness is
 required to solve such problems, I have no problem accepting that 
 practically this is a big obstacle.

You could probably fix the seed for the random number generator and
thereby make the build fixed. But then some joker would construct a
source code that would run into the worst case with specifically that
seed.

 It will not be trivial to establish the equivalence of two such 
 pieces of object code.

 Thank you for sharing this insight. I think you have helped me to
 finally see the problem that others have pointed to.
  
 Taking etch as the example, could you give any idea what percentage
 of package we might expect this to turn up in ?

I think was doing some randomness with register allocation in some
cases. But I'm not sure if that is still the case with all the changes
gcc had over the years.

But if then you could get for example r0 and r1 interchanged in a
function causing a change in all opcaodes involving them.

 In any case it sounds very much like this idea would be much harder 
 to do than I had originally hoped :-(  

It will be impossible if you don't excude a lot of known timestamps.

 Regards,
 Paddy

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-24 Thread Neil Williams
On Mon, 24 Sep 2007 02:13:32 +0200
Martin Uecker [EMAIL PROTECTED] wrote:

 The idea is not to replace hashes by bit-by-bit comparison, but to
 be able to *independendly* reproduce binaries from source code in
 a bit-identical way. 

And what is going to happen when I used gcc-4.2.2007foo and you use
gcc-4.2.1 etc.? You have the .orig.tar.gz and you have the .diff.gz.
The standard method is to compare the .orig.tar.gz and then use
'interdiff -z' against the new .diff.gz.

 Then third parties can recreate the binaries
 and publish recreated hashes. 

Why? I see no benefit.

 If the recreated hashes are identical
 then you can be sure that nobody has tempered with the build process

You'll *only* get that if the build tools are identical - that isn't
tampering, it is bug fixing. gcc is not bug-free, each new version can
include new bugs or regressions - same applies to autotools, dpkg, etc.etc.

 and the binary is actually created from the unmodified sources.

== compare the .orig.tar.gz - nothing else is needed for that and all
the current tools already handle this portion.

 The
 current scheme just protects against tempering after signing. That
 is actually not very much.

You have to trust a DD at some point. If you can't trust me to build
packages properly, you'll just have to rebuild the entire archive
yourself.

-- 

Neil Williams
=
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


pgpGxU8SODj6V.pgp
Description: PGP signature


Re: Building packages three times in a row

2007-09-24 Thread Lucas Nussbaum
On 23/09/07 at 23:32 +0200, Martin Uecker wrote:
 
 Patrick Winnertz wrote:
  Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau:
Hmmhh, what do you do about programs etc that encode the build-time in
the binary? I mean they obviously will change between builds?
  
   Hopefully they don't encode the build-time in the file list?
  We checked not for files which differ, but only for files which are missing 
  in the first package. or which are missing in the second package.
 
 
 I think it would be really cool if the Debian policy required
 that packages could be rebuild bit-identical from source. 
 At the moment, it is impossible to independly verify the
 integricity of binary packages.

We are currently very far from that. If you want to go that direction,
you have to find a several-steps process that would make us go there.

I compared the result of a one build, with the result of a package built
three times, using debdiff. This has several flaws:

- it only compared the list of files. If the same files are there, but
  with totally different size, it won't notice.

- it didn't compare with what is in the archive: packages in the archive
  might be totally different, because they were built at a different
  time (with a different toolchain), or in a dirty environment.

Basically, the goal you should aim at is rebuilding a package should
generate binary packages similar enough to what's already in the
archive.

Raphael's dpkg-shlibdeps work should also help with that, but it doesn't
seem like #430367 has progressed recently?
-- 
| Lucas Nussbaum
| [EMAIL PROTECTED]   http://www.lucas-nussbaum.net/ |
| jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F |


signature.asc
Description: Digital signature


Re: Building packages three times in a row

2007-09-24 Thread Goswin von Brederlow
Martin Uecker [EMAIL PROTECTED] writes:

 Patrick Winnertz wrote:
 Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau:
   Hmmhh, what do you do about programs etc that encode the build-time in
   the binary? I mean they obviously will change between builds?
 
  Hopefully they don't encode the build-time in the file list?
 We checked not for files which differ, but only for files which are missing 
 in the first package. or which are missing in the second package.


 I think it would be really cool if the Debian policy required
 that packages could be rebuild bit-identical from source. 
 At the moment, it is impossible to independly verify the
 integricity of binary packages.


 Greetings,
 Martin

Some tools use randomization to get out of worst case situations or
general optimization. For example when you look for an optimal
allocation of register usage you can do a search by picking a random
register allocation and repeat that a few thousand times to find a
suitable minimum. Or a randomized heap that gives you O(1) time for
all operations instead of O(lg n).

By requiring bit-to-bit identical results you eliminate all such
randomness and could seriously hinder the algorithm available for
tools.

Plus any bugfix in a tool will likely break it anyway as mentioned in
other mails.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-24 Thread paddy
On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote:
 
 Some tools use randomization to get out of worst case situations or
 general optimization. For example when you look for an optimal
 allocation of register usage you can do a search by picking a random
 register allocation and repeat that a few thousand times to find a
 suitable minimum.

 Or a randomized heap that gives you O(1) time for
 all operations instead of O(lg n).
 
 By requiring bit-to-bit identical results you eliminate all such
 randomness and could seriously hinder the algorithm available for
 tools.

While I have a hard time understanding why true randomness is
required to solve such problems, I have no problem accepting that 
practically this is a big obstacle.

It will not be trivial to establish the equivalence of two such 
pieces of object code.

Thank you for sharing this insight. I think you have helped me to
finally see the problem that others have pointed to.
 
Taking etch as the example, could you give any idea what percentage
of package we might expect this to turn up in ?

In any case it sounds very much like this idea would be much harder 
to do than I had originally hoped :-(  

Regards,
Paddy


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-24 Thread Martin Uecker
On Mon, Sep 24, 2007 at 08:35:50PM +0200, Goswin von Brederlow wrote:
 Martin Uecker [EMAIL PROTECTED] writes:

  I think it would be really cool if the Debian policy required
  that packages could be rebuild bit-identical from source. 
  At the moment, it is impossible to independly verify the
  integricity of binary packages.
 
 
  Greetings,
  Martin
 
 Some tools use randomization to get out of worst case situations or
 general optimization. For example when you look for an optimal
 allocation of register usage you can do a search by picking a random
 register allocation and repeat that a few thousand times to find a
 suitable minimum. Or a randomized heap that gives you O(1) time for
 all operations instead of O(lg n).

I do not know of any compiler which does register allocation
like that. This would be a debugging nightmare!
 
 By requiring bit-to-bit identical results you eliminate all such
 randomness and could seriously hinder the algorithm available for
 tools.

Such algorithms would certainly use a pseudo-random number generator
which could be seeded identically in each build.

Do you actually know about any tool wich produces different output
each time because of the use of a randomized algorithm?

Martin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-24 Thread Jörg Sommer
Hallo Goswin,

Goswin von Brederlow [EMAIL PROTECTED] wrote:
 Some tools use randomization to get out of worst case situations or
 general optimization. For example when you look for an optimal
 allocation of register usage you can do a search by picking a random
 register allocation and repeat that a few thousand times to find a
 suitable minimum. Or a randomized heap that gives you O(1) time for
 all operations instead of O(lg n).

Do you have a name of such a tool? gcc?

Jörg.
-- 
Professor: ‚Gott‘, unverständliches und mythisches Wesen, das sich einmal
  pro Woche im Kreis der Sterblichen manifestiert um Weisheit auf Folien
  unter das Volk zu bringen.(Dschungelbuch 11, FSU Jena)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-23 Thread Martin Uecker

Patrick Winnertz wrote:
 Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau:
   Hmmhh, what do you do about programs etc that encode the build-time in
   the binary? I mean they obviously will change between builds?
 
  Hopefully they don't encode the build-time in the file list?
 We checked not for files which differ, but only for files which are missing 
 in the first package. or which are missing in the second package.


I think it would be really cool if the Debian policy required
that packages could be rebuild bit-identical from source. 
At the moment, it is impossible to independly verify the
integricity of binary packages.


Greetings,
Martin


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: Building packages three times in a row

2007-09-23 Thread Neil Williams
On Sun, 23 Sep 2007 23:32:59 +0200
Martin Uecker [EMAIL PROTECTED] wrote:

 
 Patrick Winnertz wrote:
  Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau:
Hmmhh, what do you do about programs etc that encode the build-time in
the binary? I mean they obviously will change between builds?
  
   Hopefully they don't encode the build-time in the file list?
  We checked not for files which differ, but only for files which are missing 
  in the first package. or which are missing in the second package.
 
 
 I think it would be really cool if the Debian policy required
 that packages could be rebuild bit-identical from source. 
 At the moment, it is impossible to independly verify the
 integricity of binary packages.

This has been covered before - certain upstream macros are among many
factors that ensure that this is unlikely. I, for one, use such macros
upstream to indicate the build time of the actual executable installed
so this will change the binary every time it is built.

You have md5sums and GnuPG signatures on the Release files - I see no
benefit from bit-matching.

-- 

Neil Williams
=
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/


pgpKwGcCWCEea.pgp
Description: PGP signature


Re: Building packages three times in a row

2007-09-23 Thread Martin Uecker

Neil Williams [EMAIL PROTECTED]:
 Martin Uecker [EMAIL PROTECTED] wrote:

[...]

  
  I think it would be really cool if the Debian policy required
  that packages could be rebuild bit-identical from source. 
  At the moment, it is impossible to independly verify the
  integricity of binary packages.

 This has been covered before - certain upstream macros are among 
 many factors that ensure that this is unlikely. I, for one, use such
 macros upstream to indicate the build time of the actual executable
 installed so this will change the binary every time it is built.

This could be fixed.

 You have md5sums and GnuPG signatures on the Release files - I see
 no benefit from bit-matching.

The build host could be compromised. Not that unlikely.


Martin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-23 Thread Benjamin A'Lee
On Mon, Sep 24, 2007 at 12:54:58AM +0200, Martin Uecker wrote:
 Neil Williams [EMAIL PROTECTED]:
  This has been covered before - certain upstream macros are among 
  many factors that ensure that this is unlikely. I, for one, use such
  macros upstream to indicate the build time of the actual executable
  installed so this will change the binary every time it is built.
 
 This could be fixed.

In every binary that includes the build date in it? There's rather a
lot; off the top of my head, Vim does it, and so does the Linux kernel
AFAIK.

  You have md5sums and GnuPG signatures on the Release files - I see
  no benefit from bit-matching.
 
 The build host could be compromised. Not that unlikely.

And if the build host was compromised, how would that help any more than
md5sums and gpg-signing? With access to the build host, whatever list of
bits to match could be changed along with the binary, the md5sum, and
the gpg-signature.

Anyway, surely the point of hashes like md5, sha1, etc, is that it's
much faster to do that than to compare large files bit by bit?

-- 
Benjamin A'Lee [EMAIL PROTECTED]
http://subvert.org.uk/~bma/


signature.asc
Description: Digital signature


Re: Building packages three times in a row

2007-09-23 Thread Martin Uecker

Benjamin A'Lee [EMAIL PROTECTED]:
 On Mon, Sep 24, 2007 at 12:54:58AM +0200, Martin Uecker wrote:
  Neil Williams [EMAIL PROTECTED]:
   This has been covered before - certain upstream macros are among 
   many factors that ensure that this is unlikely. I, for one, use
   such
   macros upstream to indicate the build time of the actual
   executable
   installed so this will change the binary every time it is built.
  
  This could be fixed.
 
 In every binary that includes the build date in it? There's rather a
 lot; off the top of my head, Vim does it, and so does the Linux
 kernel AFAIK.

I know. In a world where providing a correctly working clean
target is already an issue, that's pretty far fetched.
But IMHO being able to recreate binaries from source code in
a reproducable way would be a milestone for security and QA.

   You have md5sums and GnuPG signatures on the Release files - I
   see no benefit from bit-matching.
  
  The build host could be compromised. Not that unlikely.

 And if the build host was compromised, how would that help any more
 than md5sums and gpg-signing? With access to the build host, whatever
 list of bits to match could be changed along with the binary, the md5sum,
 and the gpg-signature.

 Anyway, surely the point of hashes like md5, sha1, etc, is that it's
 much faster to do that than to compare large files bit by bit?

The idea is not to replace hashes by bit-by-bit comparison, but to
be able to *independendly* reproduce binaries from source code in
a bit-identical way. Then third parties can recreate the binaries
and publish recreated hashes. If the recreated hashes are identical
then you can be sure that nobody has tempered with the build process
and the binary is actually created from the unmodified sources. The
current scheme just protects against tempering after signing. That
is actually not very much.


Martin




Re: Building packages three times in a row

2007-09-18 Thread Julien Cristau
On Tue, Sep 18, 2007 at 20:49:03 +0200, Soeren Sonnenburg wrote:

 On Mon, 2007-09-10 at 22:34 +0200, Patrick Winnertz wrote:
  Hi,
 [...]
  Furthermore we detect some issues with different package content (compared 
  to the first build) after the second and third build. This bugs will have 
  Severity: Serious.
 
 Hmmhh, what do you do about programs etc that encode the build-time in
 the binary? I mean they obviously will change between builds?
 
Hopefully they don't encode the build-time in the file list?

Cheers,
Julien


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-18 Thread Soeren Sonnenburg
On Mon, 2007-09-10 at 22:34 +0200, Patrick Winnertz wrote:
 Hi,
[...]
 Furthermore we detect some issues with different package content (compared 
 to the first build) after the second and third build. This bugs will have 
 Severity: Serious.

Hmmhh, what do you do about programs etc that encode the build-time in
the binary? I mean they obviously will change between builds?

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.


signature.asc
Description: This is a digitally signed message part


Re: Building packages three times in a row

2007-09-18 Thread Patrick Winnertz
Am Dienstag, 18. September 2007 21:12:44 schrieb Julien Cristau:
  Hmmhh, what do you do about programs etc that encode the build-time in
  the binary? I mean they obviously will change between builds?

 Hopefully they don't encode the build-time in the file list?
We checked not for files which differ, but only for files which are missing 
in the first package. or which are missing in the second package.

For example aptitude:
In the second build the complete .mo files are missing.  

Maybe you get an idea if you look on the logs:
http://people.debian.org/~lucas/logs/2007/doublebuild-09-05/failed-debdiff/


FYI: I filled all FTBFS bugs on Sunday 16.10. Since this is about 10 days 
after the rebuilt (I didn't have enough time in between) some packages 
were already fixed on Sunday. Sorry to the people who fixed this issues 
with a new upload between the rebuild and my bug reports ;-)

Greetings
Patrick


 Cheers,
 Julien



-- 
 .''`.   Patrick Winnertz [EMAIL PROTECTED]
:  :' :  GNU/Linux Debian-Edu Developer
`. `'`   http://www.der-winnie.de http://d.skolelinux.org/~winnie
  `-  Debian - when you have better things to do than fixing systems


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-17 Thread Neil Williams
On Mon, 10 Sep 2007 22:34:52 +0200
Patrick Winnertz [EMAIL PROTECTED] wrote:

 as a QA effort the whole archive was rebuilt over the weekend to catch
 build-failures, whether a package can be build three tmes in a row (unpack,
 build, clean, build,clean, build). 

What happens about false-positives? No script is perfect - it appears
that this script has got it wrong in the case of libgpeschedule at least.

 This is the second effort to get rid of those issues. The first effort was 
 announced by Martin-Zobel Helas on 16 May 2007 [0].

Something went awry between the two because my packages were fine on
the first one, now just one of them is reported to fail in a way that I
simply cannot replicate.

 This must undo any effects that the build and binary targets may
 have had, except that it should leave alone any output files created
 in the parent directory by a run of a binary target.

AFAICT the clean target is fine.
 
 Please note that building a package twice in a row is a release goal for 
 lenny. 

And libgpeschedule does build two, three or more times in a row. I
don't understand why the test routine shows a failure when AFAICT none
exists. The build log makes no sense and appears to be incomplete so I
have no way of replicating the build and no way to fix it. 

Unless someone can demonstrate whether something is actually going
wrong and give me some ideas on how to fix it, I'm going to have to
close 442636 as an artifact of a broken tool.

-- 


Neil Williams
=
http://www.data-freedom.org/
http://www.nosoftwarepatents.com/
http://www.linux.codehelp.co.uk/



pgp9PF99Vlh4R.pgp
Description: PGP signature


Building packages three times in a row

2007-09-10 Thread Patrick Winnertz
Hi,

as a QA effort the whole archive was rebuilt over the weekend to catch
build-failures, whether a package can be build three tmes in a row (unpack,
build, clean, build,clean, build). 
This is the second effort to get rid of those issues. The first effort was 
announced by Martin-Zobel Helas on 16 May 2007 [0].

We found again about 400 packages not having a sane clean target. 

To cite
http://www.debian.org/doc/debian-policy/ch-source.html#s-debianrules

clean

This must undo any effects that the build and binary targets may
have had, except that it should leave alone any output files created
in the parent directory by a run of a binary target.

We'll fill bug reports against every package that FTBFS in this way with 
Severity: Important. 

Furthermore we detect some issues with different package content (compared 
to the first build) after the second and third build. This bugs will have 
Severity: Serious.

You'll find after we at the end all filled bugs either here [1] or here 
[2].

Please note that building a package twice in a row is a release goal for 
lenny. 

Greetings
Patrick Winnertz


[0]:http://lists.debian.org/debian-devel/2007/05/msg00490.html
[1]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-doublebuild
[2]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-debdiff-differ
-- 
 .''`.   Patrick Winnertz [EMAIL PROTECTED]
:  :' :  GNU/Linux Debian-Edu Developer
`. `'`   http://www.der-winnie.de http://d.skolelinux.org/~winnie
  `-  Debian - when you have better things to do than fixing systems


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Building packages three times in a row

2007-09-10 Thread Patrick Winnertz
Am Montag, 10. September 2007 22:34:52 schrieb Patrick Winnertz:
 Hi,

 as a QA effort the whole archive was rebuilt over the weekend to catch
 build-failures, whether a package can be build three tmes in a row
 (unpack, build, clean, build,clean, build).
 This is the second effort to get rid of those issues. The first effort
 was announced by Martin-Zobel Helas on 16 May 2007 [0].

 We found again about 400 packages not having a sane clean target.

 To cite
 http://www.debian.org/doc/debian-policy/ch-source.html#s-debianrules

 clean

 This must undo any effects that the build and binary targets may
 have had, except that it should leave alone any output files created
 in the parent directory by a run of a binary target.

 We'll fill bug reports against every package that FTBFS in this way with
 Severity: Important.

 Furthermore we detect some issues with different package content
 (compared to the first build) after the second and third build. This
 bugs will have Severity: Serious.

 You'll find after we at the end all filled bugs either here [1] or here
 [2].

 Please note that building a package twice in a row is a release goal for
 lenny.

 Greetings
 Patrick Winnertz


 [0]:http://lists.debian.org/debian-devel/2007/05/msg00490.html
 [1]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED]
ebian.org;tag=qa-doublebuild
 [2]:http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED]
ebian.org;tag=qa-debdiff-differ 
Mmpf... I've to correct the third link... there is a small typo.

You can see the filled bug reports with a different package content after 
the build here:

http://bugs.debian.org/cgi-bin/[EMAIL PROTECTED];tag=qa-debdiff

Thanks.

Greetings
Patrick Winnertz

-- 
 .''`.   Patrick Winnertz [EMAIL PROTECTED]
:  :' :  GNU/Linux Debian-Edu Developer
`. `'`   http://www.der-winnie.de http://d.skolelinux.org/~winnie
  `-  Debian - when you have better things to do than fixing systems


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]