Re: make git ignore the timestamp embedded in PDFs
Hi Hannes, thanks for taking this up and sorry for the long delay in my answer. Johannes Sixt j...@kdbg.org writes: Am 14.05.2013 15:17, schrieb Andreas Leha: Hi all, how can I make git ignore the time stamp(s) in a PDF. Two PDFs that differ only in these time stamps should be considered identical. ... What I tried is a filter: ,[ ~/.gitconfig ] | [filter pdfresetdate] | clean = pdfresetdate ` This 'works' as far as the committed pdf indeed has the date reset to my default value. However, when I re-checkout the files, they are marked modified by git. I'm using cleaned files every now and then, but not on Linux. I have never observed this behavior recently. If you 'git add' the file, does it keep its modified state? Does 'git yes. diff' tell a difference? no. Here is a complete 'session': , | mkdir test | cd test | git init | echo '*.pdf filter=pdfresetdate' .gitattributes | cp ~/PDF/score_table.pdf . | pdfinfo score_table.pdf | Title: (score_table) | Author: (andreas) | Creator:GPL Ghostscript 905 (ps2write) | Producer: GPL Ghostscript 9.05 | CreationDate: Fri Feb 8 15:44:47 2013 | ModDate:Fri Feb 8 15:44:47 2013 | Tagged: no | Pages: 1 | Encrypted: no | Page size: 595 x 842 pts (A4) | File size: 36989 bytes | Optimized: no | PDF version:1.4 | git add score_table.pdf | pdfinfo score_table.pdf | Title: (score_table) | Author: (andreas) | Creator:GPL Ghostscript 905 (ps2write) | Producer: GPL Ghostscript 9.05 | CreationDate: Fri Feb 8 15:44:47 2013 | ModDate:Fri Feb 8 15:44:47 2013 | Tagged: no | Pages: 1 | Encrypted: no | Page size: 595 x 842 pts (A4) | File size: 36989 bytes | Optimized: no | PDF version:1.4 | git commit -m test | pdfinfo score_table.pdf | Title: (score_table) | Author: (andreas) | Creator:GPL Ghostscript 905 (ps2write) | Producer: GPL Ghostscript 9.05 | CreationDate: Fri Feb 8 15:44:47 2013 | ModDate:Fri Feb 8 15:44:47 2013 | Tagged: no | Pages: 1 | Encrypted: no | Page size: 595 x 842 pts (A4) | File size: 36989 bytes | Optimized: no | PDF version:1.4 | rm score_table.pdf | git checkout score_table.pdf | git status | # On branch master | # Changes not staged for commit: | # (use git add file... to update what will be committed) | # (use git checkout -- file... to discard changes in working directory) | # | # modified: score_table.pdf | # | # Untracked files: | # (use git add file... to include in what will be committed) | # | # .gitattributes | no changes added to commit (use git add and/or git commit -a) | pdfinfo score_table.pdf | Title: (score_table) | Author: (andreas) | Creator:GPL Ghostscript 905 (ps2write) | Producer: GPL Ghostscript 9.05 | CreationDate: Mon Jan 1 07:26:19 1979 | ModDate:Mon Jan 1 07:26:19 1979 | Tagged: no | Pages: 1 | Encrypted: no | Page size: 595 x 842 pts (A4) | File size: 37126 bytes | Optimized: no | PDF version:1.4 | git add score_table.pdf | git status | # On branch master | # Changes to be committed: | # (use git reset HEAD file... to unstage) | # | # modified: score_table.pdf | # | # Untracked files: | # (use git add file... to include in what will be committed) | # | # .gitattributes | git diff score_table.pdf | ` Regards, Andreas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: make git ignore the timestamp embedded in PDFs
Am 18.05.2013 09:42, schrieb Andreas Leha: Am 14.05.2013 15:17, schrieb Andreas Leha: Hi all, how can I make git ignore the time stamp(s) in a PDF. Two PDFs that differ only in these time stamps should be considered identical. ... What I tried is a filter: ,[ ~/.gitconfig ] | [filter pdfresetdate] | clean = pdfresetdate ` This 'works' as far as the committed pdf indeed has the date reset to my default value. However, when I re-checkout the files, they are marked modified by git. I'm using cleaned files every now and then, but not on Linux. I have never observed this behavior recently. If you 'git add' the file, does it keep its modified state? Does 'git yes. diff' tell a difference? no. I do not believe you. I'm sure that Binary files differ was reported. The reason is that your pdfresetdate script is not idempotent. Look: $ pdfresetdate x.pdf y.pdf $ pdfresetdate y.pdf z.pdf $ md5sum x.pdf y.pdf z.pdf c46a7097574a035e89d1a46d93c83528 x.pdf 8e6d942b4cc7d8a4dfe6898867573617 y.pdf e6333bc0f8ab9781d3e1d811a392d516 z.pdf A file that was already cleaned by the clean filter must not be modified, i.e., the y.pdf and z.pdf should be identical. But they are not. Fix your clean filter. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: make git ignore the timestamp embedded in PDFs
Johannes Sixt j...@kdbg.org writes: Am 18.05.2013 09:42, schrieb Andreas Leha: Am 14.05.2013 15:17, schrieb Andreas Leha: Hi all, how can I make git ignore the time stamp(s) in a PDF. Two PDFs that differ only in these time stamps should be considered identical. ... What I tried is a filter: ,[ ~/.gitconfig ] | [filter pdfresetdate] | clean = pdfresetdate ` This 'works' as far as the committed pdf indeed has the date reset to my default value. However, when I re-checkout the files, they are marked modified by git. I'm using cleaned files every now and then, but not on Linux. I have never observed this behavior recently. If you 'git add' the file, does it keep its modified state? Does 'git yes. diff' tell a difference? no. I do not believe you. I'm sure that Binary files differ was reported. You are correct, of course. I had forgotten that I also had enabled a special diff for pdf files, that reports the difference in the pdfinfo output. The reason is that your pdfresetdate script is not idempotent. Look: $ pdfresetdate x.pdf y.pdf $ pdfresetdate y.pdf z.pdf $ md5sum x.pdf y.pdf z.pdf c46a7097574a035e89d1a46d93c83528 x.pdf 8e6d942b4cc7d8a4dfe6898867573617 y.pdf e6333bc0f8ab9781d3e1d811a392d516 z.pdf Thanks for that. I had not noticed due to the non-binary diff I had enabled. A file that was already cleaned by the clean filter must not be modified, i.e., the y.pdf and z.pdf should be identical. But they are not. Fix your clean filter. I will (try to) do. Anyway, git seems unresponsible for my issue. Thanks for that clear analysis! Regards, Andreas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
make git ignore the timestamp embedded in PDFs
Hi all, how can I make git ignore the time stamp(s) in a PDF. Two PDFs that differ only in these time stamps should be considered identical. Here is an example: , | pdfinfo some.pdf | Title: R Graphics Output | Creator:R | Producer: R 2.15.1 | CreationDate: Thu Jan 24 13:43:31 2013 == these entries | ModDate:Thu Jan 24 13:43:31 2013 == should be ignored | Tagged: no | Pages: 1 | Encrypted: no | Page size: 504 x 504 pts | File size: 54138 bytes | Optimized: no | PDF version:1.4 ` What I tried is a filter: ,[ ~/.gitconfig ] | [filter pdfresetdate] | clean = pdfresetdate ` With this filter script: ,[ pdfresetdate ] | #!/bin/bash | | FILEASARG=true | if [ $# == 0 ]; then | FILEASARG=false | fi | | if $FILEASARG ; then | FILENAME=$1 | else | FILENAME=`mktemp` | cat /dev/stdin ${FILENAME} | fi | | TMPFILE=`mktemp` | TMPFILE2=`mktemp` | | ## dump the pdf metadata to a file and replace the dates | pdftk $FILENAME dump_data | sed -e '{N;s/Date\nInfoValue: D:.*/Date\nInfoValue: D:19790101072619/}' $TMPFILE | | ## update the pdf metadata | pdftk $FILENAME update_info $TMPFILE output $TMPFILE2 | | ## overwrite the original pdf | mv -f $TMPFILE2 $FILENAME | | ## clean up | rm -f $TMPFILE | rm -f $TMPFILE2 | if [ -n $FILEASARG ] ; then | cat $FILENAME | fi ` This 'works' as far as the committed pdf indeed has the date reset to my default value. However, when I re-checkout the files, they are marked modified by git. So, my question is: How can I make git *completely* ignore the embedded date in the PDF? Many thanks in advance for any help! Regards, Andreas PS: I had posted this question (without much success) here: http://stackoverflow.com/questions/16058187/make-git-ignore-the-date-in-pdf-files and with no answer on the git-users mailing list: https://groups.google.com/forum/#!topic/git-users/KqtecNa3cOc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: make git ignore the timestamp embedded in PDFs
Am 14.05.2013 15:17, schrieb Andreas Leha: Hi all, how can I make git ignore the time stamp(s) in a PDF. Two PDFs that differ only in these time stamps should be considered identical. ... What I tried is a filter: ,[ ~/.gitconfig ] | [filter pdfresetdate] | clean = pdfresetdate ` This 'works' as far as the committed pdf indeed has the date reset to my default value. However, when I re-checkout the files, they are marked modified by git. I'm using cleaned files every now and then, but not on Linux. I have never observed this behavior recently. If you 'git add' the file, does it keep its modified state? Does 'git diff' tell a difference? -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html