Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-18 Thread Chris Demetriou
On Thu, Mar 18, 2010 at 22:41, Ralf Wildenhues  wrote:
> Now that the locale normalization patch is in place, I've committed this
> patch now, and added Chris to THANKS.

Sweet, thanks!


chris




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-18 Thread Ralf Wildenhues
* Bob Friesenhahn wrote on Tue, Mar 16, 2010 at 10:12:07PM CET:
> On Tue, 16 Mar 2010, Ralf Wildenhues wrote:
> >
> >Yes, that may be it.  However, that also means that, while the patch
> >fixes things for you, it doesn't really add value to Libtool in the
> >sense that we cannot guarantee an improvement of some portable kind to
> >users.  That's ok per se, but of course not ideal.  ;-)
> 
> As long as build portability does not suffer, I think it is a good
> thing for builds to be as deterministic as possible.  Otherwise it
> is possible for the built binaries to perform differently from build
> to build.

I agree; it's just that we still can't promise anything.  Anyway, build
performance should not suffer too much from this patch since the sort's
are added only in places where we fork much already, so that isn't a big
problem either.

Now that the locale normalization patch is in place, I've committed this
patch now, and added Chris to THANKS.

Cheers,
Ralf

2010-03-19  Chris Demetriou  

Sort output of 'find' to enable deterministic builds.
* libltdl/config/ltmain.m4sh (func_extract_archives): Sort
output of 'find'.
* libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise.
* THANKS: Update.




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-16 Thread Bob Friesenhahn

On Tue, 16 Mar 2010, Ralf Wildenhues wrote:


Yes, that may be it.  However, that also means that, while the patch
fixes things for you, it doesn't really add value to Libtool in the
sense that we cannot guarantee an improvement of some portable kind to
users.  That's ok per se, but of course not ideal.  ;-)


As long as build portability does not suffer, I think it is a good 
thing for builds to be as deterministic as possible.  Otherwise it is 
possible for the built binaries to perform differently from build to 
build.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-16 Thread Chris Demetriou
On Tue, Mar 16, 2010 at 13:12, Ralf Wildenhues  wrote:
> Yes, that may be it.  However, that also means that, while the patch
> fixes things for you, it doesn't really add value to Libtool in the
> sense that we cannot guarantee an improvement of some portable kind to
> users.  That's ok per se, but of course not ideal.  ;-)

Sort-of.  It (or something similar) is necessary, but not sufficient.

In order to get a benefit from this, users definitely need a
tightly-controlled environment.  It's achievable with relatively
little effort with ELF, tho.

OTOH, without this, even with a tightly-controlled environment, the
only way to achieve the desired result is by wrapping/hacking find (to
produce a deterministic output order) or by making a similar change
locally, or by restricting the environment even more tightly (e.g.,
filesystems or host OSes used for build).

It's not particularly hard for people who care to replicate this
change (it took me longer to rebuild my tools with this change on N
host systems to verify the change, than it did to identify the source
of the problem once I started looking)... but IMO it does add *some*
value.


>> (BTW, my purpose isn't to be able to compare output files so much as
>> create bit-identical output files.  There are several reasons for
>> this, but in summary, comparison isn't quite good enough.
>
> Wanna share a couple of those reasons why comparison isn't good enough?

Philosophically, I strongly believe that bit-for-bit reproducible
builds are good.  They give me confidence in my own build processes.

However, there are several points related to this, why I prefer to
avoid comparison tools:

* Let's start off with "I'm lazy!" 8-)  If i've gotta maintain a
custom diff tool, then that's one more tool that I have to maintain.
It *will* break, and it will grow more and more special cases and hair
over time -- and IMO the very existence of a tool encourages people to
make changes which result in more and more binary divergences from
build to build.  Can do it, but bit-identical is better, and
zero-tolerance for divergence means less long-term maintenance: it's
the responsibility of people making changes to make sure they work
properly w.r.t. reproducible builds.  (Related: if I have to maintain
a custom diff tool, e.g., to compare the contents of RPM packages...
I'd rather write one that's simpler, e.g., operates on the hashes of
the files in the package, rather than extracts the package contents
and does special comparisons.)

* convincing others and/or auditing changes.  If I want to convince
you that my build processes produce consistent results from build to
build (e.g., across different host system types), or that my new
release contains *no* changes to a particular component... which would
you rather see: bit-for-bit identical or "mutated within acceptable
limits" (as verified by a special tool)?   Again, use of a custom
comparison tool is possible, but not ideal.

* revision control and build systems.  For some purposes, we check
toolchains (including related libraries) into revision control, in
which case diffs -> extra revisions.  For build or revision control
systems which cache or which are content-addressable (e.g., use file
hashes to look up files), any bit different == "completely different,"
with follow-on inefficiencies.  Again, could be addressed via a
special tool (just like I know people have written 'ar' file timestamp
mungers, before I added 'D' support to 'ar')... but just better to get
it right during the build.


Those are my thoughts, anyway.


chris




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-16 Thread Ralf Wildenhues
* Chris Demetriou wrote on Tue, Mar 16, 2010 at 08:28:12AM CET:
> On Mon, Mar 15, 2010 at 23:37, Ralf Wildenhues wrote:
> > libtool also uses 'ar' in a number of places and cases, most
> > prominently, but not limited to, the static linking scenario.
> > Do you have measures in place to use 'D' there too?
> 
> Yes.  At least in all the places I've noticed (or that are relevant
> for a Linux / ELF build), libtool actually uses $AR.  Likewise with
> $RANLIB.
> 
> If you want a repeatable build that avoids ar timestamps, you need:
>   AR=... wrapper script that adds 'D' and strips 'u' from ar's first arg ...
>   RANLIB=/bin/true or wrapper script that invokes /bin/true.

OK.

> (i didn't grep through all of the libtool code, but AFAICT at least
> ltmain.sh uses $AR and $RANLIB in preference to hard-coding ar and
> ranlib ... which makes sense, since, for example, it's gotta be
> cross-compilation friendly.)

Yes.

> > IIRC some (non-GNU) compilers (or the assemblers they call, I don't
> > recall) also add time stamps to generated objects.  Are they relevant to
> > you?  I'm not sure whether one can turn stamps off everywhere; have you
> > looked into normalizing output like in GCC's contrib/compare-debug?
> 
> In my experience, it's the object formats.

Yes, that may be it.  However, that also means that, while the patch
fixes things for you, it doesn't really add value to Libtool in the
sense that we cannot guarantee an improvement of some portable kind to
users.  That's ok per se, but of course not ideal.  ;-)

> E.g., COFF objects
> (including ECOFF and PE) include a timestamp.  Looks like BFD puts in
> 0 for that in most cases, but e.g. the bfd/peXXigen.c code looks like
> it *will* generate a timestamp.  Those object file formats are not
> relevant to me, though i've seen environments before were they're
> overridden.  (I once wrote a small set of tools to do a
> timestamp-insensitive compare for windows builds.  I've thankfully
> been *mostly* COFF-free for the past 10+ years.  8-)
> 
> (BTW, my purpose isn't to be able to compare output files so much as
> create bit-identical output files.  There are several reasons for
> this, but in summary, comparison isn't quite good enough.

Wanna share a couple of those reasons why comparison isn't good enough?

>  In order to
> do this, you've gotta do a bunch of other things, e.g., build with the
> same paths, use -frandom-seed appropriately if building some C++ code,
> ...)

> > Then, it would be nice to be able to confirm that we produce stable
> > output in cases where this is desirable; IOW, have testsuite exposure.
> > Can you describe your setup in a bit more detail?  (You could also
> > provide a test case, but that will probably require copyright papers
> > then first.)
> 
> Actually,  my employer has a blanket assignment AFAIU, so that's not a 
> problem.

Wasn't aware of that.

Thanks,
Ralf




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-16 Thread Chris Demetriou
Hi Ralf,

On Mon, Mar 15, 2010 at 23:37, Ralf Wildenhues  wrote:
> libtool also uses 'ar' in a number of places and cases, most
> prominently, but not limited to, the static linking scenario.
> Do you have measures in place to use 'D' there too?

Yes.  At least in all the places I've noticed (or that are relevant
for a Linux / ELF build), libtool actually uses $AR.  Likewise with
$RANLIB.

If you want a repeatable build that avoids ar timestamps, you need:
  AR=... wrapper script that adds 'D' and strips 'u' from ar's first arg ...
  RANLIB=/bin/true or wrapper script that invokes /bin/true.

(i didn't grep through all of the libtool code, but AFAICT at least
ltmain.sh uses $AR and $RANLIB in preference to hard-coding ar and
ranlib ... which makes sense, since, for example, it's gotta be
cross-compilation friendly.)


> IIRC some (non-GNU) compilers (or the assemblers they call, I don't
> recall) also add time stamps to generated objects.  Are they relevant to
> you?  I'm not sure whether one can turn stamps off everywhere; have you
> looked into normalizing output like in GCC's contrib/compare-debug?

In my experience, it's the object formats.  E.g., COFF objects
(including ECOFF and PE) include a timestamp.  Looks like BFD puts in
0 for that in most cases, but e.g. the bfd/peXXigen.c code looks like
it *will* generate a timestamp.  Those object file formats are not
relevant to me, though i've seen environments before were they're
overridden.  (I once wrote a small set of tools to do a
timestamp-insensitive compare for windows builds.  I've thankfully
been *mostly* COFF-free for the past 10+ years.  8-)

(BTW, my purpose isn't to be able to compare output files so much as
create bit-identical output files.  There are several reasons for
this, but in summary, comparison isn't quite good enough.  In order to
do this, you've gotta do a bunch of other things, e.g., build with the
same paths, use -frandom-seed appropriately if building some C++ code,
...)


> Then, it would be nice to be able to confirm that we produce stable
> output in cases where this is desirable; IOW, have testsuite exposure.
> Can you describe your setup in a bit more detail?  (You could also
> provide a test case, but that will probably require copyright papers
> then first.)

Actually,  my employer has a blanket assignment AFAIU, so that's not a problem.

My test case is a GCC build, built on three different host system
types (Ubuntu Dapper, Ubuntu Hardy, and Ubuntu pre-Lucid).  libstdc++
is big enough and the filesystems different enough w.r.t. 'find'
behavior that I get different orders on different systems.  Not an
easily-rolled-up test case.  8-)

The problem with writing a test case is that find's traversal order is
not guaranteed.  It depends on a whole bunch of things, but most
especially the implementation of readdir (or whatever similar
interface it uses to read dir entries -- I haven't looked inside of
GNU find ... maybe ever 8-), which often depends on the underlying
file system.

e.g., on my (Hardy) workstation:

# /tmp is on the ext3 root file system:
[...@cgda v14]$ mkdir /tmp/a
[...@cgda v14]$ touch /tmp/a/a.o
[...@cgda v14]$ touch /tmp/a/b.o
[...@cgda v14]$ mkdir /tmp/b
[...@cgda v14]$ touch /tmp/b/b.o
[...@cgda v14]$ touch /tmp/b/a.o
[...@cgda v14]$ find /tmp/a
/tmp/a
/tmp/a/a.o
/tmp/a/b.o
[...@cgda v14]$ find /tmp/b
/tmp/b
/tmp/b/a.o
/tmp/b/b.o

# ~/tmp is on NFs.
[...@cgda v14]$ mkdir ~/tmp/a ~/tmp/b
[...@cgda v14]$ touch ~/tmp/a/a.o
[...@cgda v14]$ touch ~/tmp/a/b.o
[...@cgda v14]$ touch ~/tmp/b/b.o
[...@cgda v14]$ touch ~/tmp/b/a.o
[...@cgda v14]$ find ~/tmp/a
/home/cgd/tmp/a
/home/cgd/tmp/a/a.o
/home/cgd/tmp/a/b.o
[...@cgda v14]$ find ~/tmp/b
/home/cgd/tmp/b
/home/cgd/tmp/b/b.o
/home/cgd/tmp/b/a.o

(Note how in /tmp, things came out in a consistent order, and in ~/tmp
things came out in the order that they were created.)

This makes it harder to make a test case where you're *sure* that the
test is actually effective.

Probably the easiest test likely to expose the problem would be...
 * create an archive (normal or libtool, I don't know what ltmain's
func_extract_archive wants to operate on) with members b.o and a.o (in
that order)
 * convince ltmain to generate a new archive including them then
verify that the new archive has them in the order a.o and b.o.

but obviously that wouldn't actually test anything in many
circumstances (e.g., my /tmp example above).  (AFAICT it doesn't look
like libtool ever has to deal with indexless/ancient-BSD archives,
where you had to use lorder|tsort to order the archive members...
that's good because that would make the test a lot more difficult.
8-)

Unfortunately, to be honest... I don't really have a clue about how to
implement a test case like this.  I've never actually attempted to use
libtool to do anything (e.g., create archives)...

Also complicating the issue is that I really don't know think one
about libtool, e.g., how to make it do what I want in this case.
Getting a

Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-15 Thread Ralf Wildenhues
Hello Chris,

thanks for the report and patch.

* Chris Demetriou wrote on Mon, Mar 15, 2010 at 08:12:31PM CET:
> A project that I work on wants to make sure our builds are
> deterministic, i.e., same input sources -> same exact output binaries.
>  We've solved several problems (e.g., I've added the ar 'D' flag)
> related to this.  The last remaining issue is in libtool.

libtool also uses 'ar' in a number of places and cases, most
prominently, but not limited to, the static linking scenario.
Do you have measures in place to use 'D' there too?

IIRC some (non-GNU) compilers (or the assemblers they call, I don't
recall) also add time stamps to generated objects.  Are they relevant to
you?  I'm not sure whether one can turn stamps off everywhere; have you
looked into normalizing output like in GCC's contrib/compare-debug?

Then, it would be nice to be able to confirm that we produce stable
output in cases where this is desirable; IOW, have testsuite exposure.
Can you describe your setup in a bit more detail?  (You could also
provide a test case, but that will probably require copyright papers
then first.)

Thanks,
Ralf

> 2010-03-15  Chris Demetriou  
> 
> Sort output of 'find' to enable deterministic builds.
> * libltdl/config/ltmain.m4sh (func_extract_archives): Sort
> output of 'find'.
> * libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise.




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-15 Thread Chris Demetriou
I *think* it's sufficient as-is.

LC_ALL (and several other variables) are set via
libltdl/config/ltmain.m4sh circa line 114

looks like configure (which is ultimately where libtool.m4 gets
included) does it too, via autoconf _AS_SHELL_SANITIZE... I can't find
the exact path that causes it to be included, but setting LC_ALL is at
the start of autoconf script I can find.


(And oh, my build scripts -- independent of all of the above -- do it,
too, so *I'm* covered N different ways...  8-)



chris

On Mon, Mar 15, 2010 at 18:12, Eric Blake  wrote:
> On 03/15/2010 01:12 PM, Chris Demetriou wrote:
>> The attached patch sorts the output of 'find' in all cases (where it's
>> not already sorted), so as to produce deterministic results.
>
> Does libtool globally force the C locale, or do you need to use:
>
> LC_ALL=C sort
>
> throughout your patch to further ensure deterministic behavior across
> different default locales?
>
> --
> Eric Blake   ebl...@redhat.com    +1-801-349-2682
> Libvirt virtualization library http://libvirt.org
>
>




Re: [patch] sort 'find' output to enable deterministic builds.

2010-03-15 Thread Eric Blake
On 03/15/2010 01:12 PM, Chris Demetriou wrote:
> The attached patch sorts the output of 'find' in all cases (where it's
> not already sorted), so as to produce deterministic results.

Does libtool globally force the C locale, or do you need to use:

LC_ALL=C sort

throughout your patch to further ensure deterministic behavior across
different default locales?

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[patch] sort 'find' output to enable deterministic builds.

2010-03-15 Thread Chris Demetriou
A project that I work on wants to make sure our builds are
deterministic, i.e., same input sources -> same exact output binaries.
 We've solved several problems (e.g., I've added the ar 'D' flag)
related to this.  The last remaining issue is in libtool.

libtool uses 'find' to identify lists of objects to include, in
certain cases.  Unfortunately, the output of 'find' can vary (e.g.,
file system type, order in which objects were created, ...), which
introduces non-determinism into my builds.

The attached patch sorts the output of 'find' in all cases (where it's
not already sorted), so as to produce deterministic results.  For my
particular build, I only *need* the second hunk in ltmain.m4sh
(my_oldobjs=...) but I audited all of the libtool sources for other
uses of find that might result in unsorted output and adjusted them as
well.

I can only easily test on an Ubunutu Hardy system.  On that, with this
change applied, 'make check' passes.



thanks,

chris
---
2010-03-15  Chris Demetriou  

Sort output of 'find' to enable deterministic builds.
* libltdl/config/ltmain.m4sh (func_extract_archives): Sort
output of 'find'.
* libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise.
2010-03-15  Chris Demetriou  

	Sort output of 'find' to enable deterministic builds.
	* libltdl/config/ltmain.m4sh (func_extract_archives): Sort
	output of 'find'.
	* libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise.

diff --git a/libltdl/config/ltmain.m4sh b/libltdl/config/ltmain.m4sh
index 8fcedc9..f2c35b6 100644
--- a/libltdl/config/ltmain.m4sh
+++ b/libltdl/config/ltmain.m4sh
@@ -2311,7 +2311,7 @@ func_extract_archives ()
 	darwin_file=
 	darwin_files=
 	for darwin_file in $darwin_filelist; do
-	  darwin_files=`find unfat-$$ -name $darwin_file -print | $NL2SP`
+	  darwin_files=`find unfat-$$ -name $darwin_file -print | sort | $NL2SP`
 	  $LIPO -create -output "$darwin_file" $darwin_files
 	done # $darwin_filelist
 	$RM -rf unfat-$$
@@ -2326,7 +2326,7 @@ func_extract_archives ()
 func_extract_an_archive "$my_xdir" "$my_xabs"
 	;;
   esac
-  my_oldobjs="$my_oldobjs "`find $my_xdir -name \*.$objext -print -o -name \*.lo -print | $NL2SP`
+  my_oldobjs="$my_oldobjs "`find $my_xdir -name \*.$objext -print -o -name \*.lo -print | sort | $NL2SP`
 done
 
 func_extract_archives_result="$my_oldobjs"
diff --git a/libltdl/m4/libtool.m4 b/libltdl/m4/libtool.m4
index 677505d..d74038f 100644
--- a/libltdl/m4/libtool.m4
+++ b/libltdl/m4/libtool.m4
@@ -6001,20 +6001,20 @@ if test "$_lt_caught_CXX_error" != yes; then
 	  _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~
 		rm -rf $tpldir~
 		$CC --prelink_objects --instantiation_dir $tpldir $objs $libobjs $compile_deplibs~
-		compile_command="$compile_command `find $tpldir -name \*.o | $NL2SP`"'
+		compile_command="$compile_command `find $tpldir -name \*.o | sort | $NL2SP`"'
 	  _LT_TAGVAR(old_archive_cmds, $1)='tpldir=Template.dir~
 		rm -rf $tpldir~
 		$CC --prelink_objects --instantiation_dir $tpldir $oldobjs$old_deplibs~
-		$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name \*.o | $NL2SP`~
+		$AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name \*.o | sort | $NL2SP`~
 		$RANLIB $oldlib'
 	  _LT_TAGVAR(archive_cmds, $1)='tpldir=Template.dir~
 		rm -rf $tpldir~
 		$CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~
-		$CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib'
+		$CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib'
 	  _LT_TAGVAR(archive_expsym_cmds, $1)='tpldir=Template.dir~
 		rm -rf $tpldir~
 		$CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~
-		$CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib'
+		$CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib'
 	  ;;
 	*) # Version 6 and above use weak symbols
 	  _LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib'