Source: ocaml Version: 4.02.3-7.1 Severity: wishlist Tags: patch User: reproducible-bui...@lists.alioth.debian.org Usertags: toolchain randomness
Hi, currently, ocaml embeds the file paths of temporary files that a preprocessor created into the debug output. This makes several source packages in Debian unreproducible. To see the effect, look for example at this diffoscope output of src:botch: │ ├── data.tar.xz │ │ ├── data.tar │ │ │ ├── ./usr/lib/debug/.build-id/03/28382a2670552f3318cc61bdebc13bbeef8f2f.debug │ │ │ │ ├── readelf --wide --symbols {} │ │ │ │ │ @@ -56,15 +56,15 @@ │ │ │ │ │ 52: 0000000000830838 0 NOTYPE LOCAL DEFAULT 25 caml_startup__9 │ │ │ │ │ 53: 0000000000830868 0 NOTYPE LOCAL DEFAULT 25 caml_startup__10 │ │ │ │ │ 54: 0000000000830898 0 NOTYPE LOCAL DEFAULT 25 caml_startup__11 │ │ │ │ │ 55: 00000000008308c8 0 NOTYPE LOCAL DEFAULT 25 caml_startup__12 │ │ │ │ │ 56: 0000000000000000 0 FILE LOCAL DEFAULT ABS std_exit.ml │ │ │ │ │ 57: 00000000005c4430 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask │ │ │ │ │ 58: 00000000005c4440 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask │ │ │ │ │ - 59: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp29daa7 │ │ │ │ │ + 59: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp4dfb7e │ │ │ │ │ 60: 00000000005c4450 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask │ │ │ │ │ 61: 00000000005c4460 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask │ │ │ │ │ 62: 0000000000836558 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__30 │ │ │ │ │ 63: 0000000000836570 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__31 │ │ │ │ │ 64: 0000000000836588 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__32 │ │ │ │ │ 65: 00000000008365c0 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__2 │ │ │ │ │ 66: 0000000000836668 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__5 │ │ │ │ │ @@ -87,15 +87,15 @@ │ │ │ │ │ 83: 00000000008367e0 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__23 │ │ │ │ │ 84: 00000000008367f8 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__24 │ │ │ │ │ 85: 0000000000836810 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__25 │ │ │ │ │ 86: 0000000000836820 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__26 │ │ │ │ │ 87: 0000000000836868 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__27 │ │ │ │ │ 88: 0000000000836880 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__28 │ │ │ │ │ 89: 00000000008368c8 0 NOTYPE LOCAL DEFAULT 25 camlAnnotate$2dstrong__29 │ │ │ │ │ - 90: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlpp21639f │ │ │ │ │ + 90: 0000000000000000 0 FILE LOCAL DEFAULT ABS /tmp/ocamlppfd0623 │ │ │ │ │ 91: 00000000005c4470 0 NOTYPE LOCAL DEFAULT 15 caml_negf_mask │ │ │ │ │ 92: 00000000005c4480 0 NOTYPE LOCAL DEFAULT 15 caml_absf_mask │ │ │ │ │ 93: 0000000000836fc0 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__43 │ │ │ │ │ 94: 0000000000836fd8 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__44 │ │ │ │ │ 95: 0000000000836ff0 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__45 │ │ │ │ │ 96: 0000000000837008 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__46 │ │ │ │ │ 97: 0000000000837028 0 NOTYPE LOCAL DEFAULT 25 camlSrcGraphExtras__9 I see two ways to fix this problem. - instead of choosing a random temporary file name for the preprocessor output, choose a stable file name - do not include the path to the temporary file created by the preprocessor in the debug information I like the latter option because knowing this path is useless anyway because the file is only temporary. Unfortunately, I was unable to figure out a good way to implement this solution. So instead, I implemented a solution that calculates the path of the temporary files from the MD5 sum of the preprocessor name and the input file path. The idea is, that running the same preprocessor on the same file path should produce the same output and thus choosing the same filename should not pose any problem. I chose to calculate a hash instead of using the bare string values because the file paths contain characters like the slash which must not appear in file names and also because it allows a stable temporary filename length no matter the length of the input path. Here is the patch: --- a/driver/pparse.ml +++ b/driver/pparse.ml @@ -19,9 +19,17 @@ type error = exception Error of error (* Optionally preprocess a source file *) +external open_desc: string -> open_flag list -> int -> int = "caml_sys_open" +external close_desc: int -> unit = "caml_sys_close" let call_external_preprocessor sourcefile pp = - let tmpfile = Filename.temp_file "ocamlpp" "" in + (* do not use Filename.temp_file as the resulting temporary file name will be + * recorded in the debug output of the resulting binary and thus make the + * output random and unreproducible *) + let temp_dir = Filename.get_temp_dir_name () in + let hash = Digest.to_hex (Digest.string (sourcefile^pp)) in + let tmpfile = Filename.concat temp_dir ("ocamlpp"^hash) in + close_desc(open_desc tmpfile [Open_wronly; Open_creat; Open_excl] 0o600); let comm = Printf.sprintf "%s %s > %s" pp (Filename.quote sourcefile) tmpfile in Applying this patch and rebuilding src:ocaml leads to src:botch becoming reproducible. I do not know whether the patch is suitable for inclusion into the upstream project but I trust that you forward the issue accordingly. Thanks! cheers, josch