Bug#978499: fop: reproducible builds: Support using SOURCE_DATE_EPOCH for timestamps in PDF files

2021-01-01 Thread Vagrant Cascadian
On 2021-01-01, Vagrant Cascadian wrote:
> On 2021-01-01, tony mancill wrote:
>> On Fri, Jan 01, 2021 at 11:17:46AM -0800, Vagrant Cascadian wrote:
>>> It seems so very, very close, xorg-docs now is only varying on the
>>> timezone, but otherwise respecting SOURCE_DATE_EPOCH:
>>> 
>>>   
>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/xorg-docs.html
>>> 
>>> 
>>> But what really confuses me is that "treeview" is still ignoring
>>> SOURCE_DATE_EPOCH entirely (e.g. timestamps showing up from 2022):
>>> 
>>>   
>>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/treeview.html

But it at least fixed one package!

  
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/gpsbabel.html


So that's three different packages with the whole range of ... not
fixed, partly fixed (e.g. timezone), and wholly fixed... :)

Most that are known appear to fall into the former two categories,
unfortunately.


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#978499: fop: reproducible builds: Support using SOURCE_DATE_EPOCH for timestamps in PDF files

2021-01-01 Thread Vagrant Cascadian
On 2021-01-01, tony mancill wrote:
> On Fri, Jan 01, 2021 at 11:17:46AM -0800, Vagrant Cascadian wrote:
>> It seems so very, very close, xorg-docs now is only varying on the
>> timezone, but otherwise respecting SOURCE_DATE_EPOCH:
>> 
>>   
>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/xorg-docs.html
>> 
>> 
>> But what really confuses me is that "treeview" is still ignoring
>> SOURCE_DATE_EPOCH entirely (e.g. timestamps showing up from 2022):
>> 
>>   
>> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/treeview.html
>> 
>> 
>> From the output, it looks like both packages are embedding the same type
>> of values...
>> 
>> I confirmed that both builds actually were using fop version 2.5-3,
>> according to the build logs. Almost makes me wonder if treeview somehow
>> has an invalid date in the changelog entry...
>
> In case it helps point out a hole in my testing strategy, I did my local
> testing by extracting the xorg-docs source package into 4 different
> directories and then building with sbuild, either against "pure" sid or
> as below to test fop 2.5-3 before the upload, and then running
> diffoscope against the resulting changes files:

> sbuild --chroot=sid-amd64-sbuild --extra-package=/path/to/fop_2.5-3_all.deb 
> --extra-package=/path/to/libfop-java_2.5-3_all.deb

You might want to use two different chroots, one with timezone set to
UTC+14 and one with the timezone set to UTC-12. These are the widest
range of timezones actually in use in the real world, although in theory
you could do something crazy like UTC+0 and UTC+26.

Another thing to try would be to do one build in a virtual machine with
the clock adjusted ... e.g. "qemu -rtc 2022-04-01" as well as using a
very different timezone.


> diffoscope c/xorg-docs_1.7.1-1.2_amd64.changes 
> d/xorg-docs_1.7.1-1.2_amd64.changes
> --- c/xorg-docs_1.7.1-1.2_amd64.changes
> +++ d/xorg-docs_1.7.1-1.2_amd64.changes
> ├── Files
> │ @@ -1,6 +1,6 @@
> │  
> │   c3c9468c8de1825668386eb1c8131e4f 1132 doc optional xorg-docs_1.7.1-1.2.dsc
> │   f6a6ecca98d411d73492303db3190bca 13250 doc optional 
> xorg-docs_1.7.1-1.2.diff.gz
> │   6939769b47ecad2875ae10674ed4db03 84224 doc optional 
> xorg-docs-core_1.7.1-1.2_all.deb
> │   9baa141ec6258704be08b8873f8892c9 1160544 doc optional 
> xorg-docs_1.7.1-1.2_all.deb
> │ - a09f88c06107005935ff6664eccfe306 7625 doc optional 
> xorg-docs_1.7.1-1.2_amd64.buildinfo
> │ + 6664227d398e6954e41b5c4fc468448c 7625 doc optional 
> xorg-docs_1.7.1-1.2_amd64.buildinfo
> ├── xorg-docs_1.7.1-1.2_amd64.buildinfo
> │ ├── Build-Date
> │ │ @@ -1 +1 @@
> │ │ -Thu, 31 Dec 2020 06:40:14 +
> │ │ +Thu, 31 Dec 2020 06:42:37 +
> │ ├── Build-Path
> │ │ @@ -1 +1 @@
> │ │ -/build/xorg-docs-By7U5v/xorg-docs-1.7.1
> │ │ +/build/xorg-docs-kFh9qq/xorg-docs-1.7.1
>  
>
> By comparing two separate 2.5-2 builds, I was able to confirm that 2.5-2
> only partially addressed the embedded timestamps in the PDFs, but the
> diff above looks like what we want, right?

Yes, a diff like that is what we're hoping for!


From what I've been reading, java date objects are internally encoded in
UTC, but when you use the functions to output the date, it applies the
local timezone unless explicitly asked for a different timezone. /o\

Thanks for sharing the pain!


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#978499: fop: reproducible builds: Support using SOURCE_DATE_EPOCH for timestamps in PDF files

2020-12-30 Thread tony mancill
On Wed, Dec 30, 2020 at 06:23:29PM -0800, Vagrant Cascadian wrote:
> On 2020-12-30, tony mancill wrote:
> > On Tue, Dec 29, 2020 at 11:13:48AM -0800, Vagrant Cascadian wrote:
> >> Thanks for the quick upload! unfortunately...
> >> 
> >> > For example, in xorg-docs:
> >> >
> >> >   
> >> > https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/amd64/diffoscope-results/xorg-docs.html
> >> >
> >> >   /usr/share/doc/xorg-docs/xlfd/xlfd.pdf.gz
> >> >   
> >> >   CreationDate:·"D:20201225182038-12'00'"
> >> >   vs.
> >> >   CreationDate:·"D:20220129025203+14'00'"
> >> 
> >> I rescheduled various builds after fop landed in unstable, and it
> >> appears to not fully fix the issue...
> >> 
> >> It clearly fixed the issue for me when building xorg-docs with reprotest
> >> locally, which does test time and timezone variations... but it uses
> >> faketime, which often behaves differently than a system with an adjusted
> >> running clock such as the tests.reproducible-builds.org infrastructure.
> >
> > Hrm indeed...
> >
> > For what it's worth, the diffoscope for bullseye (which doesn't have the
> > fix for fop in there yet) and unstable look different to me.  In
> > bullseye, the "CreationDate" in the differs, as expected.  But in
> > unstable, the difference is in CreateDate in the XML metadata about the
> > file.
> >
> > I think it's possible that we are falling into this block of
> > PDFMetadata.java [1]:
> >
> > //Set creation date if not available, yet
> > if (info.getCreationDate() == null) {
> > Date d = new Date();
> > info.setCreationDate(d);
> > }
> >
> > That would fit the symptoms.  In any event, in for a penny, in for a pound. 
> >  I think we can fix this by checking for null creationDate in PDFInfo.java 
> > [2] and once again using SOURCE_DATE_EPOCH if set.
> >
> > [1] 
> > https://salsa.debian.org/java-team/fop/-/blob/master/fop-core/src/main/java/org/apache/fop/pdf/PDFMetadata.java#L135-139
> > [2] 
> > https://salsa.debian.org/java-team/fop/-/blob/master/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java#L190-195
> >
> > I have pushed patch to wrap the original modification to PDFInfo.java in
> > a try/catch but haven't yet uploaded.  I'll amend that and I do a little
> > reprotesting before uploading again.  
> 
> Thanks for continuing to dive into this one! :)
> 
> Maybe this is a red herring, but I also noticed that in PDFInfo.java
> there are two definitions of the modified function with the same name...
> 
> (snip)
> 
> Or is there some java thing to handle functions with the same names?

Yes, it's a common pattern in Java.  The methods vary in their arguments
and so are distinct signatures.  In this case, the method that takes a
TimeZone as an argument is called by the other method of the same name
in PDFInfo *and* in PDFEmbeddedFile.  So... I went looking for all of
the invocations of new Date() in the fop code and found several other
methods where SOURCE_DATE_EPOCH should be checked.

I have an updated patch for fop that addresses the issue with xorg-docs
and probably a few others too.  I'm going to let ratt chew on the build
r-deps before uploading, but expect to upload tomorrow.

Cheers,
tony



Bug#978499: fop: reproducible builds: Support using SOURCE_DATE_EPOCH for timestamps in PDF files

2020-12-30 Thread Vagrant Cascadian
On 2020-12-30, tony mancill wrote:
> On Tue, Dec 29, 2020 at 11:13:48AM -0800, Vagrant Cascadian wrote:
>> Thanks for the quick upload! unfortunately...
>> 
>> > For example, in xorg-docs:
>> >
>> >   
>> > https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/amd64/diffoscope-results/xorg-docs.html
>> >
>> >   /usr/share/doc/xorg-docs/xlfd/xlfd.pdf.gz
>> >   
>> >   CreationDate:·"D:20201225182038-12'00'"
>> >   vs.
>> >   CreationDate:·"D:20220129025203+14'00'"
>> 
>> I rescheduled various builds after fop landed in unstable, and it
>> appears to not fully fix the issue...
>> 
>> It clearly fixed the issue for me when building xorg-docs with reprotest
>> locally, which does test time and timezone variations... but it uses
>> faketime, which often behaves differently than a system with an adjusted
>> running clock such as the tests.reproducible-builds.org infrastructure.
>
> Hrm indeed...
>
> For what it's worth, the diffoscope for bullseye (which doesn't have the
> fix for fop in there yet) and unstable look different to me.  In
> bullseye, the "CreationDate" in the differs, as expected.  But in
> unstable, the difference is in CreateDate in the XML metadata about the
> file.
>
> I think it's possible that we are falling into this block of
> PDFMetadata.java [1]:
>
> //Set creation date if not available, yet
> if (info.getCreationDate() == null) {
> Date d = new Date();
> info.setCreationDate(d);
> }
>
> That would fit the symptoms.  In any event, in for a penny, in for a pound.  
> I think we can fix this by checking for null creationDate in PDFInfo.java [2] 
> and once again using SOURCE_DATE_EPOCH if set.
>
> [1] 
> https://salsa.debian.org/java-team/fop/-/blob/master/fop-core/src/main/java/org/apache/fop/pdf/PDFMetadata.java#L135-139
> [2] 
> https://salsa.debian.org/java-team/fop/-/blob/master/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java#L190-195
>
> I have pushed patch to wrap the original modification to PDFInfo.java in
> a try/catch but haven't yet uploaded.  I'll amend that and I do a little
> reprotesting before uploading again.  

Thanks for continuing to dive into this one! :)


Maybe this is a red herring, but I also noticed that in PDFInfo.java
there are two definitions of the modified function with the same name...

/**
 * Formats a date/time according to the PDF specification
(D:MMDDHHmmSSOHH'mm').
 * @param time date/time value to format
 * @param tz the time zone
 * @return the requested String representation
 */
protected static String formatDateTime(final Date time, TimeZone tz)
{
return DateFormatUtil.formatPDFDate(time, tz);
}

/**
 * Formats a date/time according to the PDF
specification. (D:MMDDHHmmSSOHH'mm').
 * @param time date/time value to format
 * @return the requested String representation
 */
protected static String formatDateTime(final Date time) {
return formatDateTime(time, TimeZone.getDefault());
}


Or is there some java thing to handle functions with the same names?



live well,
  vagrant


signature.asc
Description: PGP signature


Bug#978499: fop: reproducible builds: Support using SOURCE_DATE_EPOCH for timestamps in PDF files

2020-12-27 Thread Vagrant Cascadian
Package: fop
Severity: normal
Tags: patch
User: reproducible-bui...@lists.alioth.debian.org
Usertags: timestamps toolchain
X-Debbugs-Cc: reproducible-b...@lists.alioth.debian.org

Several packages use fop to generate PDF files in Debian packages, but
the resulting PDF files have embedding timestamp information in the
CreationDate of the PDF:

  
https://tests.reproducible-builds.org/debian/issues/unstable/timestamps_in_pdf_generated_by_apache_fop_issue.html


For example, in xorg-docs:

  
https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/amd64/diffoscope-results/xorg-docs.html

  /usr/share/doc/xorg-docs/xlfd/xlfd.pdf.gz
  
  CreationDate:·"D:20201225182038-12'00'"
  vs.
  CreationDate:·"D:20220129025203+14'00'"


The attached patch fixes this by adding support for the
SOURCE_DATE_EPOCH environment variable to fop, which embeds the
specified timestamp rather than the current time:

  https://reproducible-builds.org/docs/source-date-epoch/


Thanks for maintaining fop!


live well,
  vagrant
From 25826ea9c86d01a8392cf593b9aa93c72b469b19 Mon Sep 17 00:00:00 2001
From: Vagrant Cascadian 
Date: Mon, 28 Dec 2020 02:48:21 +
Subject: [PATCH] PDFInfo.java: Support SOURCE_DATE_EPOCH environment variable.

https://reproducible-builds.org/docs/source-date-epoch/
---
 fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java b/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java
index 3aa5d97..79f3f42 100644
--- a/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java
+++ b/fop-core/src/main/java/org/apache/fop/pdf/PDFInfo.java
@@ -305,7 +305,14 @@ public class PDFInfo extends PDFObject {
  * @return the requested String representation
  */
 protected static String formatDateTime(final Date time) {
-return formatDateTime(time, TimeZone.getDefault());
+// https://reproducible-builds.org/docs/source-date-epoch/
+String source_date_epoch = System.getenv("SOURCE_DATE_EPOCH");
+if (source_date_epoch != null) {
+Long sourcedate = (1000 * Long.parseLong(source_date_epoch));
+return formatDateTime(new Date(sourcedate), TimeZone.getTimeZone("Etc/UTC"));
+} else {
+return formatDateTime(time, TimeZone.getDefault());
+}
 }
 
 /**
-- 
2.20.1



signature.asc
Description: PGP signature