bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-08 Thread Ludovic Courtès
Danny Milosavljevic  skribis:

> On Fri, 07 Jul 2017 19:51:10 +0200
> l...@gnu.org (Ludovic Courtès) wrote:
>
>> For CreationDate/ModDate, I think it should honor SOURCE_DATE_EPOCH as
>> in
>> .
>
> Really?  I've been leaving them off, too.  Especially because of this funny 
> comment in the upstream ghostscript:
>
> /* Initialize the IDs allocated at startup. */
> void
> pdf_initialize_ids(gx_device_pdf * pdev)
> {
> ...
> /*
>  * Acrobat Distiller sets CreationDate and ModDate to the current
>  * date and time, rather than (for example) %%CreationDate from the
>  * PostScript file.  We think this is wrong, but we do the same.
>  */
> {
> ... proceed to set CreationDate and ModDate to the current time.
> }
> }

I guess they hamper reproducibility if they’re always created?  In that
case, they need to follow SOURCE_DATE_EPOCH; if OTOH they’re only
created in specific cases that don’t matter much, we can leave them.

>> For the two UUIDs (and “ID” too?), maybe we can use, say,
>> GS_GENERATE_UUIDS; if set to 0 or “no” it’s disable, otherwise it’s
>> enabled.
>
> That would look like this:
>
> if (!getenv("GS_GENERATE_UUIDS") || strcmp(getenv("GS_GENERATE_UUIDS"), "0") 
> == 0 || strcmp(getenv("GS_GENERATE_UUIDS"), "no") == 0) ...

Yes.

Thanks!

Ludo’.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Ludovic Courtès
Danny Milosavljevic  skribis:

> On Fri, 07 Jul 2017 17:18:15 +0200
> l...@gnu.org (Ludovic Courtès) wrote:
>
>> OK, makes sense.  Maybe we can still have it disabled (or enabled) by
>> environment variable
>
> Sure.  Any suggestions for the name of the environment variable?

For CreationDate/ModDate, I think it should honor SOURCE_DATE_EPOCH as
in
.

For the two UUIDs (and “ID” too?), maybe we can use, say,
GS_GENERATE_UUIDS; if set to 0 or “no” it’s disable, otherwise it’s
enabled.

> Also, where would we set it so the build processes of all the other
> packages actually pick it up?

Eventually we can add it to gnu-build-system.scm, but for now, given
that core-updates is well built, we should add it on a case-by-case
basis.  I don’t think there are that many packages that produce PDFs,
but I could be wrong.

How does that sound?

Thank you,
Ludo’.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Danny Milosavljevic
On Fri, 07 Jul 2017 19:51:10 +0200
l...@gnu.org (Ludovic Courtès) wrote:

> For CreationDate/ModDate, I think it should honor SOURCE_DATE_EPOCH as
> in
> .

Really?  I've been leaving them off, too.  Especially because of this funny 
comment in the upstream ghostscript:

/* Initialize the IDs allocated at startup. */
void
pdf_initialize_ids(gx_device_pdf * pdev)
{
...
/*
 * Acrobat Distiller sets CreationDate and ModDate to the current
 * date and time, rather than (for example) %%CreationDate from the
 * PostScript file.  We think this is wrong, but we do the same.
 */
{
... proceed to set CreationDate and ModDate to the current time.
}
}

> For the two UUIDs (and “ID” too?), maybe we can use, say,
> GS_GENERATE_UUIDS; if set to 0 or “no” it’s disable, otherwise it’s
> enabled.

That would look like this:

if (!getenv("GS_GENERATE_UUIDS") || strcmp(getenv("GS_GENERATE_UUIDS"), "0") == 
0 || strcmp(getenv("GS_GENERATE_UUIDS"), "no") == 0) ...

> > Also, where would we set it so the build processes of all the other
> > packages actually pick it up?  
> 
> Eventually we can add it to gnu-build-system.scm, but for now, given
> that core-updates is well built, we should add it on a case-by-case
> basis.  I don’t think there are that many packages that produce PDFs,
> but I could be wrong.

Okay :)






bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Ludovic Courtès
Danny Milosavljevic  skribis:

>> .
>
> Hmm... can you access the patch linked there (under "Solution") ?

It’s 404, but Leo sent a link to the patch on debian.org.

Ludo’.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Danny Milosavljevic
On Fri, 07 Jul 2017 17:18:15 +0200
l...@gnu.org (Ludovic Courtès) wrote:

> OK, makes sense.  Maybe we can still have it disabled (or enabled) by
> environment variable

Sure.  Any suggestions for the name of the environment variable?  Also, where 
would we set it so the build processes of all the other packages actually pick 
it up?

Would it disable and re-enable all these things at once? :

* CreationDate
* ModDate
* /ID
* XMP DocumentUUID
* XMP InstanceUUID





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Danny Milosavljevic
> .

Hmm... can you access the patch linked there (under "Solution") ?





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Ludovic Courtès
Danny Milosavljevic  skribis:

> Hi Ludo,
>
> On Fri, 07 Jul 2017 14:00:09 +0200
> l...@gnu.org (Ludovic Courtès) wrote:
>
>> Danny Milosavljevic  skribis:
>> 
>> > Also, newer PDF files have an RDF header specifying some extra information
>> > in an XML-like format.  For example there's an instance UUID (PDF/A 
>> > specifies
>> > that it's recommended to set this to an empty string), and a document UUID.
>> > The latter again is time-based.  
>> 
>> If it’s time-based, then the solution may be to honor SOURCE_DATE_EPOCH.
>
> Upstream says definitely not.  The UUIDs are supposed to be unique and they 
> don't want anyone writing fixed UUIDs into documents (except for "" for the 
> instance ID which they themselves do).
>
> I think there could be some enterprise search engine which associates a 
> document with other resources using the document UUID - and if everyone went 
> and reused UUIDs it would be very confused.
>
> That's why I left it off.

OK, makes sense.  Maybe we can still have it disabled (or enabled) by
environment variable instead of having it removed wholesale?

Ludo’.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Danny Milosavljevic
Hi Ludo,

On Fri, 07 Jul 2017 14:00:09 +0200
l...@gnu.org (Ludovic Courtès) wrote:

> Danny Milosavljevic  skribis:
> 
> > Also, newer PDF files have an RDF header specifying some extra information
> > in an XML-like format.  For example there's an instance UUID (PDF/A 
> > specifies
> > that it's recommended to set this to an empty string), and a document UUID.
> > The latter again is time-based.  
> 
> If it’s time-based, then the solution may be to honor SOURCE_DATE_EPOCH.

Upstream says definitely not.  The UUIDs are supposed to be unique and they 
don't want anyone writing fixed UUIDs into documents (except for "" for the 
instance ID which they themselves do).

I think there could be some enterprise search engine which associates a 
document with other resources using the document UUID - and if everyone went 
and reused UUIDs it would be very confused.

That's why I left it off.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-07 Thread Ludovic Courtès
Danny Milosavljevic  skribis:

> Also, newer PDF files have an RDF header specifying some extra information
> in an XML-like format.  For example there's an instance UUID (PDF/A specifies
> that it's recommended to set this to an empty string), and a document UUID.
> The latter again is time-based.

If it’s time-based, then the solution may be to honor SOURCE_DATE_EPOCH.

I asked on #reproducible-builds (OFTC).  A patch had been proposed
upstream but rejected:

  http://bugs.ghostscript.com/show_bug.cgi?id=696765

See also
.

Ludo’.





bug#27563: [PATCH v3 0/2] Make ghostscript reproducible.

2017-07-06 Thread Danny Milosavljevic
So this is what's needed to finally make ghostscript, netpbm and groff
reproducible.  Groff just finished its 38th build on my machine and it
finally compared the rounds as equal.

I'm posting those here in order to make sure we all agree that this is
the way to go.

The patchset patches PDF creation in ghostscript.  It's for core-updates.

The PDF file has a trailer field "/ID" which is required only when
encrypting.  But ghostscript derives it from the current time.
So I figured leaving it off if allowed would be the easiest fix.
If it's not there then it can't change :P

Also, newer PDF files have an RDF header specifying some extra information
in an XML-like format.  For example there's an instance UUID (PDF/A specifies
that it's recommended to set this to an empty string), and a document UUID.
The latter again is time-based.

This patchset
* removes the RDF tag which contains the document UUID and
* sets the instance UUID to "" and
* removes the ID tag if allowed (i.e. if not encrypting).

Because of the printf-style functions, it has to split up the printfs a bit,
but really it just makes one of the parts printed optional - in multiple
places (because PDF trailers can be chained).

Danny Milosavljevic (2):
  gnu: ghostscript: Don't write document UUID; use "" as instance UUID.
  gnu: ghostscript: Write document ID only when encrypting.

 gnu/local.mk   |  2 +
 gnu/packages/ghostscript.scm   |  4 +-
 .../patches/ghostscript-no-header-id.patch | 47 ++
 .../patches/ghostscript-no-header-uuid.patch   | 28 +
 4 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 gnu/packages/patches/ghostscript-no-header-id.patch
 create mode 100644 gnu/packages/patches/ghostscript-no-header-uuid.patch