Source: ocaml
Version: 4.02.3-7.1
Severity: wishlist
Tags: patch
User: reproducible-builds@lists.alioth.debian.org
Usertags: toolchain randomness

Hi,

currently, ocaml embeds the file paths of temporary files that a
preprocessor created into the debug output. This makes several source
packages in Debian unreproducible. To see the effect, look for example
at this diffoscope output of src:botch:

│   ├── data.tar.xz
│   │   ├── data.tar
│   │   │   ├── 
./usr/lib/debug/.build-id/03/28382a2670552f3318cc61bdebc13bbeef8f2f.debug
│   │   │   │   ├── readelf --wide --symbols {}
│   │   │   │   │ @@ -56,15 +56,15 @@
│   │   │   │   │      52: 0000000000830838     0 NOTYPE  LOCAL  DEFAULT   25 
caml_startup__9
│   │   │   │   │      53: 0000000000830868     0 NOTYPE  LOCAL  DEFAULT   25 
caml_startup__10
│   │   │   │   │      54: 0000000000830898     0 NOTYPE  LOCAL  DEFAULT   25 
caml_startup__11
│   │   │   │   │      55: 00000000008308c8     0 NOTYPE  LOCAL  DEFAULT   25 
caml_startup__12
│   │   │   │   │      56: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
std_exit.ml
│   │   │   │   │      57: 00000000005c4430     0 NOTYPE  LOCAL  DEFAULT   15 
caml_negf_mask
│   │   │   │   │      58: 00000000005c4440     0 NOTYPE  LOCAL  DEFAULT   15 
caml_absf_mask
│   │   │   │   │ -    59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
/tmp/ocamlpp29daa7
│   │   │   │   │ +    59: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
/tmp/ocamlpp4dfb7e
│   │   │   │   │      60: 00000000005c4450     0 NOTYPE  LOCAL  DEFAULT   15 
caml_negf_mask
│   │   │   │   │      61: 00000000005c4460     0 NOTYPE  LOCAL  DEFAULT   15 
caml_absf_mask
│   │   │   │   │      62: 0000000000836558     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__30
│   │   │   │   │      63: 0000000000836570     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__31
│   │   │   │   │      64: 0000000000836588     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__32
│   │   │   │   │      65: 00000000008365c0     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__2
│   │   │   │   │      66: 0000000000836668     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__5
│   │   │   │   │ @@ -87,15 +87,15 @@
│   │   │   │   │      83: 00000000008367e0     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__23
│   │   │   │   │      84: 00000000008367f8     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__24
│   │   │   │   │      85: 0000000000836810     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__25
│   │   │   │   │      86: 0000000000836820     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__26
│   │   │   │   │      87: 0000000000836868     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__27
│   │   │   │   │      88: 0000000000836880     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__28
│   │   │   │   │      89: 00000000008368c8     0 NOTYPE  LOCAL  DEFAULT   25 
camlAnnotate$2dstrong__29
│   │   │   │   │ -    90: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
/tmp/ocamlpp21639f
│   │   │   │   │ +    90: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
/tmp/ocamlppfd0623
│   │   │   │   │      91: 00000000005c4470     0 NOTYPE  LOCAL  DEFAULT   15 
caml_negf_mask
│   │   │   │   │      92: 00000000005c4480     0 NOTYPE  LOCAL  DEFAULT   15 
caml_absf_mask
│   │   │   │   │      93: 0000000000836fc0     0 NOTYPE  LOCAL  DEFAULT   25 
camlSrcGraphExtras__43
│   │   │   │   │      94: 0000000000836fd8     0 NOTYPE  LOCAL  DEFAULT   25 
camlSrcGraphExtras__44
│   │   │   │   │      95: 0000000000836ff0     0 NOTYPE  LOCAL  DEFAULT   25 
camlSrcGraphExtras__45
│   │   │   │   │      96: 0000000000837008     0 NOTYPE  LOCAL  DEFAULT   25 
camlSrcGraphExtras__46
│   │   │   │   │      97: 0000000000837028     0 NOTYPE  LOCAL  DEFAULT   25 
camlSrcGraphExtras__9

I see two ways to fix this problem.

 - instead of choosing a random temporary file name for the preprocessor
   output, choose a stable file name

 - do not include the path to the temporary file created by the
   preprocessor in the debug information

I like the latter option because knowing this path is useless anyway
because the file is only temporary. Unfortunately, I was unable to
figure out a good way to implement this solution.

So instead, I implemented a solution that calculates the path of the
temporary files from the MD5 sum of the preprocessor name and the input
file path. The idea is, that running the same preprocessor on the same
file path should produce the same output and thus choosing the same
filename should not pose any problem. I chose to calculate a hash
instead of using the bare string values because the file paths contain
characters like the slash which must not appear in file names and also
because it allows a stable temporary filename length no matter the
length of the input path.

Here is the patch:

--- a/driver/pparse.ml
+++ b/driver/pparse.ml
@@ -19,9 +19,17 @@ type error =
 exception Error of error
 
 (* Optionally preprocess a source file *)
+external open_desc: string -> open_flag list -> int -> int = "caml_sys_open"
+external close_desc: int -> unit = "caml_sys_close"
 
 let call_external_preprocessor sourcefile pp =
-      let tmpfile = Filename.temp_file "ocamlpp" "" in
+      (* do not use Filename.temp_file as the resulting temporary file name 
will be
+       * recorded in the debug output of the resulting binary and thus make the
+       * output random and unreproducible *)
+      let temp_dir = Filename.get_temp_dir_name () in
+      let hash = Digest.to_hex (Digest.string (sourcefile^pp)) in
+      let tmpfile = Filename.concat temp_dir ("ocamlpp"^hash) in
+      close_desc(open_desc tmpfile [Open_wronly; Open_creat; Open_excl] 0o600);
       let comm = Printf.sprintf "%s %s > %s"
                                 pp (Filename.quote sourcefile) tmpfile
       in

Applying this patch and rebuilding src:ocaml leads to src:botch becoming
reproducible.

I do not know whether the patch is suitable for inclusion into the
upstream project but I trust that you forward the issue accordingly.

Thanks!

cheers, josch

_______________________________________________
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Reply via email to