Re: [patch] sort 'find' output to enable deterministic builds.
On Thu, Mar 18, 2010 at 22:41, Ralf Wildenhues wrote: > Now that the locale normalization patch is in place, I've committed this > patch now, and added Chris to THANKS. Sweet, thanks! chris
Re: [patch] sort 'find' output to enable deterministic builds.
* Bob Friesenhahn wrote on Tue, Mar 16, 2010 at 10:12:07PM CET: > On Tue, 16 Mar 2010, Ralf Wildenhues wrote: > > > >Yes, that may be it. However, that also means that, while the patch > >fixes things for you, it doesn't really add value to Libtool in the > >sense that we cannot guarantee an improvement of some portable kind to > >users. That's ok per se, but of course not ideal. ;-) > > As long as build portability does not suffer, I think it is a good > thing for builds to be as deterministic as possible. Otherwise it > is possible for the built binaries to perform differently from build > to build. I agree; it's just that we still can't promise anything. Anyway, build performance should not suffer too much from this patch since the sort's are added only in places where we fork much already, so that isn't a big problem either. Now that the locale normalization patch is in place, I've committed this patch now, and added Chris to THANKS. Cheers, Ralf 2010-03-19 Chris Demetriou Sort output of 'find' to enable deterministic builds. * libltdl/config/ltmain.m4sh (func_extract_archives): Sort output of 'find'. * libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise. * THANKS: Update.
Re: [patch] sort 'find' output to enable deterministic builds.
On Tue, 16 Mar 2010, Ralf Wildenhues wrote: Yes, that may be it. However, that also means that, while the patch fixes things for you, it doesn't really add value to Libtool in the sense that we cannot guarantee an improvement of some portable kind to users. That's ok per se, but of course not ideal. ;-) As long as build portability does not suffer, I think it is a good thing for builds to be as deterministic as possible. Otherwise it is possible for the built binaries to perform differently from build to build. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
Re: [patch] sort 'find' output to enable deterministic builds.
On Tue, Mar 16, 2010 at 13:12, Ralf Wildenhues wrote: > Yes, that may be it. However, that also means that, while the patch > fixes things for you, it doesn't really add value to Libtool in the > sense that we cannot guarantee an improvement of some portable kind to > users. That's ok per se, but of course not ideal. ;-) Sort-of. It (or something similar) is necessary, but not sufficient. In order to get a benefit from this, users definitely need a tightly-controlled environment. It's achievable with relatively little effort with ELF, tho. OTOH, without this, even with a tightly-controlled environment, the only way to achieve the desired result is by wrapping/hacking find (to produce a deterministic output order) or by making a similar change locally, or by restricting the environment even more tightly (e.g., filesystems or host OSes used for build). It's not particularly hard for people who care to replicate this change (it took me longer to rebuild my tools with this change on N host systems to verify the change, than it did to identify the source of the problem once I started looking)... but IMO it does add *some* value. >> (BTW, my purpose isn't to be able to compare output files so much as >> create bit-identical output files. There are several reasons for >> this, but in summary, comparison isn't quite good enough. > > Wanna share a couple of those reasons why comparison isn't good enough? Philosophically, I strongly believe that bit-for-bit reproducible builds are good. They give me confidence in my own build processes. However, there are several points related to this, why I prefer to avoid comparison tools: * Let's start off with "I'm lazy!" 8-) If i've gotta maintain a custom diff tool, then that's one more tool that I have to maintain. It *will* break, and it will grow more and more special cases and hair over time -- and IMO the very existence of a tool encourages people to make changes which result in more and more binary divergences from build to build. Can do it, but bit-identical is better, and zero-tolerance for divergence means less long-term maintenance: it's the responsibility of people making changes to make sure they work properly w.r.t. reproducible builds. (Related: if I have to maintain a custom diff tool, e.g., to compare the contents of RPM packages... I'd rather write one that's simpler, e.g., operates on the hashes of the files in the package, rather than extracts the package contents and does special comparisons.) * convincing others and/or auditing changes. If I want to convince you that my build processes produce consistent results from build to build (e.g., across different host system types), or that my new release contains *no* changes to a particular component... which would you rather see: bit-for-bit identical or "mutated within acceptable limits" (as verified by a special tool)? Again, use of a custom comparison tool is possible, but not ideal. * revision control and build systems. For some purposes, we check toolchains (including related libraries) into revision control, in which case diffs -> extra revisions. For build or revision control systems which cache or which are content-addressable (e.g., use file hashes to look up files), any bit different == "completely different," with follow-on inefficiencies. Again, could be addressed via a special tool (just like I know people have written 'ar' file timestamp mungers, before I added 'D' support to 'ar')... but just better to get it right during the build. Those are my thoughts, anyway. chris
Re: [patch] sort 'find' output to enable deterministic builds.
* Chris Demetriou wrote on Tue, Mar 16, 2010 at 08:28:12AM CET: > On Mon, Mar 15, 2010 at 23:37, Ralf Wildenhues wrote: > > libtool also uses 'ar' in a number of places and cases, most > > prominently, but not limited to, the static linking scenario. > > Do you have measures in place to use 'D' there too? > > Yes. At least in all the places I've noticed (or that are relevant > for a Linux / ELF build), libtool actually uses $AR. Likewise with > $RANLIB. > > If you want a repeatable build that avoids ar timestamps, you need: > AR=... wrapper script that adds 'D' and strips 'u' from ar's first arg ... > RANLIB=/bin/true or wrapper script that invokes /bin/true. OK. > (i didn't grep through all of the libtool code, but AFAICT at least > ltmain.sh uses $AR and $RANLIB in preference to hard-coding ar and > ranlib ... which makes sense, since, for example, it's gotta be > cross-compilation friendly.) Yes. > > IIRC some (non-GNU) compilers (or the assemblers they call, I don't > > recall) also add time stamps to generated objects. Are they relevant to > > you? I'm not sure whether one can turn stamps off everywhere; have you > > looked into normalizing output like in GCC's contrib/compare-debug? > > In my experience, it's the object formats. Yes, that may be it. However, that also means that, while the patch fixes things for you, it doesn't really add value to Libtool in the sense that we cannot guarantee an improvement of some portable kind to users. That's ok per se, but of course not ideal. ;-) > E.g., COFF objects > (including ECOFF and PE) include a timestamp. Looks like BFD puts in > 0 for that in most cases, but e.g. the bfd/peXXigen.c code looks like > it *will* generate a timestamp. Those object file formats are not > relevant to me, though i've seen environments before were they're > overridden. (I once wrote a small set of tools to do a > timestamp-insensitive compare for windows builds. I've thankfully > been *mostly* COFF-free for the past 10+ years. 8-) > > (BTW, my purpose isn't to be able to compare output files so much as > create bit-identical output files. There are several reasons for > this, but in summary, comparison isn't quite good enough. Wanna share a couple of those reasons why comparison isn't good enough? > In order to > do this, you've gotta do a bunch of other things, e.g., build with the > same paths, use -frandom-seed appropriately if building some C++ code, > ...) > > Then, it would be nice to be able to confirm that we produce stable > > output in cases where this is desirable; IOW, have testsuite exposure. > > Can you describe your setup in a bit more detail? (You could also > > provide a test case, but that will probably require copyright papers > > then first.) > > Actually, my employer has a blanket assignment AFAIU, so that's not a > problem. Wasn't aware of that. Thanks, Ralf
Re: [patch] sort 'find' output to enable deterministic builds.
Hi Ralf, On Mon, Mar 15, 2010 at 23:37, Ralf Wildenhues wrote: > libtool also uses 'ar' in a number of places and cases, most > prominently, but not limited to, the static linking scenario. > Do you have measures in place to use 'D' there too? Yes. At least in all the places I've noticed (or that are relevant for a Linux / ELF build), libtool actually uses $AR. Likewise with $RANLIB. If you want a repeatable build that avoids ar timestamps, you need: AR=... wrapper script that adds 'D' and strips 'u' from ar's first arg ... RANLIB=/bin/true or wrapper script that invokes /bin/true. (i didn't grep through all of the libtool code, but AFAICT at least ltmain.sh uses $AR and $RANLIB in preference to hard-coding ar and ranlib ... which makes sense, since, for example, it's gotta be cross-compilation friendly.) > IIRC some (non-GNU) compilers (or the assemblers they call, I don't > recall) also add time stamps to generated objects. Are they relevant to > you? I'm not sure whether one can turn stamps off everywhere; have you > looked into normalizing output like in GCC's contrib/compare-debug? In my experience, it's the object formats. E.g., COFF objects (including ECOFF and PE) include a timestamp. Looks like BFD puts in 0 for that in most cases, but e.g. the bfd/peXXigen.c code looks like it *will* generate a timestamp. Those object file formats are not relevant to me, though i've seen environments before were they're overridden. (I once wrote a small set of tools to do a timestamp-insensitive compare for windows builds. I've thankfully been *mostly* COFF-free for the past 10+ years. 8-) (BTW, my purpose isn't to be able to compare output files so much as create bit-identical output files. There are several reasons for this, but in summary, comparison isn't quite good enough. In order to do this, you've gotta do a bunch of other things, e.g., build with the same paths, use -frandom-seed appropriately if building some C++ code, ...) > Then, it would be nice to be able to confirm that we produce stable > output in cases where this is desirable; IOW, have testsuite exposure. > Can you describe your setup in a bit more detail? (You could also > provide a test case, but that will probably require copyright papers > then first.) Actually, my employer has a blanket assignment AFAIU, so that's not a problem. My test case is a GCC build, built on three different host system types (Ubuntu Dapper, Ubuntu Hardy, and Ubuntu pre-Lucid). libstdc++ is big enough and the filesystems different enough w.r.t. 'find' behavior that I get different orders on different systems. Not an easily-rolled-up test case. 8-) The problem with writing a test case is that find's traversal order is not guaranteed. It depends on a whole bunch of things, but most especially the implementation of readdir (or whatever similar interface it uses to read dir entries -- I haven't looked inside of GNU find ... maybe ever 8-), which often depends on the underlying file system. e.g., on my (Hardy) workstation: # /tmp is on the ext3 root file system: [...@cgda v14]$ mkdir /tmp/a [...@cgda v14]$ touch /tmp/a/a.o [...@cgda v14]$ touch /tmp/a/b.o [...@cgda v14]$ mkdir /tmp/b [...@cgda v14]$ touch /tmp/b/b.o [...@cgda v14]$ touch /tmp/b/a.o [...@cgda v14]$ find /tmp/a /tmp/a /tmp/a/a.o /tmp/a/b.o [...@cgda v14]$ find /tmp/b /tmp/b /tmp/b/a.o /tmp/b/b.o # ~/tmp is on NFs. [...@cgda v14]$ mkdir ~/tmp/a ~/tmp/b [...@cgda v14]$ touch ~/tmp/a/a.o [...@cgda v14]$ touch ~/tmp/a/b.o [...@cgda v14]$ touch ~/tmp/b/b.o [...@cgda v14]$ touch ~/tmp/b/a.o [...@cgda v14]$ find ~/tmp/a /home/cgd/tmp/a /home/cgd/tmp/a/a.o /home/cgd/tmp/a/b.o [...@cgda v14]$ find ~/tmp/b /home/cgd/tmp/b /home/cgd/tmp/b/b.o /home/cgd/tmp/b/a.o (Note how in /tmp, things came out in a consistent order, and in ~/tmp things came out in the order that they were created.) This makes it harder to make a test case where you're *sure* that the test is actually effective. Probably the easiest test likely to expose the problem would be... * create an archive (normal or libtool, I don't know what ltmain's func_extract_archive wants to operate on) with members b.o and a.o (in that order) * convince ltmain to generate a new archive including them then verify that the new archive has them in the order a.o and b.o. but obviously that wouldn't actually test anything in many circumstances (e.g., my /tmp example above). (AFAICT it doesn't look like libtool ever has to deal with indexless/ancient-BSD archives, where you had to use lorder|tsort to order the archive members... that's good because that would make the test a lot more difficult. 8-) Unfortunately, to be honest... I don't really have a clue about how to implement a test case like this. I've never actually attempted to use libtool to do anything (e.g., create archives)... Also complicating the issue is that I really don't know think one about libtool, e.g., how to make it do what I want in this case. Getting a
Re: [patch] sort 'find' output to enable deterministic builds.
Hello Chris, thanks for the report and patch. * Chris Demetriou wrote on Mon, Mar 15, 2010 at 08:12:31PM CET: > A project that I work on wants to make sure our builds are > deterministic, i.e., same input sources -> same exact output binaries. > We've solved several problems (e.g., I've added the ar 'D' flag) > related to this. The last remaining issue is in libtool. libtool also uses 'ar' in a number of places and cases, most prominently, but not limited to, the static linking scenario. Do you have measures in place to use 'D' there too? IIRC some (non-GNU) compilers (or the assemblers they call, I don't recall) also add time stamps to generated objects. Are they relevant to you? I'm not sure whether one can turn stamps off everywhere; have you looked into normalizing output like in GCC's contrib/compare-debug? Then, it would be nice to be able to confirm that we produce stable output in cases where this is desirable; IOW, have testsuite exposure. Can you describe your setup in a bit more detail? (You could also provide a test case, but that will probably require copyright papers then first.) Thanks, Ralf > 2010-03-15 Chris Demetriou > > Sort output of 'find' to enable deterministic builds. > * libltdl/config/ltmain.m4sh (func_extract_archives): Sort > output of 'find'. > * libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise.
Re: [patch] sort 'find' output to enable deterministic builds.
I *think* it's sufficient as-is. LC_ALL (and several other variables) are set via libltdl/config/ltmain.m4sh circa line 114 looks like configure (which is ultimately where libtool.m4 gets included) does it too, via autoconf _AS_SHELL_SANITIZE... I can't find the exact path that causes it to be included, but setting LC_ALL is at the start of autoconf script I can find. (And oh, my build scripts -- independent of all of the above -- do it, too, so *I'm* covered N different ways... 8-) chris On Mon, Mar 15, 2010 at 18:12, Eric Blake wrote: > On 03/15/2010 01:12 PM, Chris Demetriou wrote: >> The attached patch sorts the output of 'find' in all cases (where it's >> not already sorted), so as to produce deterministic results. > > Does libtool globally force the C locale, or do you need to use: > > LC_ALL=C sort > > throughout your patch to further ensure deterministic behavior across > different default locales? > > -- > Eric Blake ebl...@redhat.com +1-801-349-2682 > Libvirt virtualization library http://libvirt.org > >
Re: [patch] sort 'find' output to enable deterministic builds.
On 03/15/2010 01:12 PM, Chris Demetriou wrote: > The attached patch sorts the output of 'find' in all cases (where it's > not already sorted), so as to produce deterministic results. Does libtool globally force the C locale, or do you need to use: LC_ALL=C sort throughout your patch to further ensure deterministic behavior across different default locales? -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[patch] sort 'find' output to enable deterministic builds.
A project that I work on wants to make sure our builds are deterministic, i.e., same input sources -> same exact output binaries. We've solved several problems (e.g., I've added the ar 'D' flag) related to this. The last remaining issue is in libtool. libtool uses 'find' to identify lists of objects to include, in certain cases. Unfortunately, the output of 'find' can vary (e.g., file system type, order in which objects were created, ...), which introduces non-determinism into my builds. The attached patch sorts the output of 'find' in all cases (where it's not already sorted), so as to produce deterministic results. For my particular build, I only *need* the second hunk in ltmain.m4sh (my_oldobjs=...) but I audited all of the libtool sources for other uses of find that might result in unsorted output and adjusted them as well. I can only easily test on an Ubunutu Hardy system. On that, with this change applied, 'make check' passes. thanks, chris --- 2010-03-15 Chris Demetriou Sort output of 'find' to enable deterministic builds. * libltdl/config/ltmain.m4sh (func_extract_archives): Sort output of 'find'. * libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise. 2010-03-15 Chris Demetriou Sort output of 'find' to enable deterministic builds. * libltdl/config/ltmain.m4sh (func_extract_archives): Sort output of 'find'. * libltdl/m4/libtool.m4 (_LT_LANG_CXX_CONFIG): Likewise. diff --git a/libltdl/config/ltmain.m4sh b/libltdl/config/ltmain.m4sh index 8fcedc9..f2c35b6 100644 --- a/libltdl/config/ltmain.m4sh +++ b/libltdl/config/ltmain.m4sh @@ -2311,7 +2311,7 @@ func_extract_archives () darwin_file= darwin_files= for darwin_file in $darwin_filelist; do - darwin_files=`find unfat-$$ -name $darwin_file -print | $NL2SP` + darwin_files=`find unfat-$$ -name $darwin_file -print | sort | $NL2SP` $LIPO -create -output "$darwin_file" $darwin_files done # $darwin_filelist $RM -rf unfat-$$ @@ -2326,7 +2326,7 @@ func_extract_archives () func_extract_an_archive "$my_xdir" "$my_xabs" ;; esac - my_oldobjs="$my_oldobjs "`find $my_xdir -name \*.$objext -print -o -name \*.lo -print | $NL2SP` + my_oldobjs="$my_oldobjs "`find $my_xdir -name \*.$objext -print -o -name \*.lo -print | sort | $NL2SP` done func_extract_archives_result="$my_oldobjs" diff --git a/libltdl/m4/libtool.m4 b/libltdl/m4/libtool.m4 index 677505d..d74038f 100644 --- a/libltdl/m4/libtool.m4 +++ b/libltdl/m4/libtool.m4 @@ -6001,20 +6001,20 @@ if test "$_lt_caught_CXX_error" != yes; then _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~ rm -rf $tpldir~ $CC --prelink_objects --instantiation_dir $tpldir $objs $libobjs $compile_deplibs~ - compile_command="$compile_command `find $tpldir -name \*.o | $NL2SP`"' + compile_command="$compile_command `find $tpldir -name \*.o | sort | $NL2SP`"' _LT_TAGVAR(old_archive_cmds, $1)='tpldir=Template.dir~ rm -rf $tpldir~ $CC --prelink_objects --instantiation_dir $tpldir $oldobjs$old_deplibs~ - $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name \*.o | $NL2SP`~ + $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name \*.o | sort | $NL2SP`~ $RANLIB $oldlib' _LT_TAGVAR(archive_cmds, $1)='tpldir=Template.dir~ rm -rf $tpldir~ $CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~ - $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' + $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib' _LT_TAGVAR(archive_expsym_cmds, $1)='tpldir=Template.dir~ rm -rf $tpldir~ $CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~ - $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' + $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib' ;; *) # Version 6 and above use weak symbols _LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname ${wl}$soname -o $lib'