Hello community, here is the log from the commit of package pdfcompare for openSUSE:Factory checked in at 2016-04-28 16:57:22 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/pdfcompare (Old) and /work/SRC/openSUSE:Factory/.pdfcompare.new (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "pdfcompare" Changes: -------- --- /work/SRC/openSUSE:Factory/pdfcompare/pdfcompare.changes 2014-01-14 21:51:50.000000000 +0100 +++ /work/SRC/openSUSE:Factory/.pdfcompare.new/pdfcompare.changes 2016-04-28 17:02:38.000000000 +0200 @@ -1,0 +2,18 @@ +Tue Apr 19 15:24:08 UTC 2016 - [email protected] + +- V1.6.8 - cleaner popup annotations unless -S + - no navigation buttons unless -f ..N.. + +------------------------------------------------------------------- +Mon Apr 18 18:17:25 UTC 2016 - [email protected] + +- V1.6.7 - support Ubuntu 14.04 + +------------------------------------------------------------------- +Mon Apr 18 13:57:26 UTC 2016 - [email protected] + +- V1.6.6 - hunspell usage hint: how to add words to private dictonary. +- pull_github.sh added. +- use pdf_highlight.py if pdfcompare.py is not in the tar. Historic name. + +------------------------------------------------------------------- Old: ---- pdfcompare-1.6.5.tar.bz2 New: ---- Makefile debian.changelog debian.compat debian.control debian.pdfcompare.install debian.rules debian.series pdfcompare-1.6.8.tar.bz2 pdfcompare.dsc pull_github.sh ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ pdfcompare.spec ++++++ --- /var/tmp/diff_new_pack.71YLEz/_old 2016-04-28 17:02:40.000000000 +0200 +++ /var/tmp/diff_new_pack.71YLEz/_new 2016-04-28 17:02:40.000000000 +0200 @@ -1,7 +1,7 @@ # # spec file for package pdfcompare # -# Copyright (c) 2014 SUSE LINUX Products GmbH, Nuernberg, Germany. +# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -17,13 +17,14 @@ Name: pdfcompare -Version: 1.6.5 +Version: 1.6.8 Release: 0 Summary: Compare two PDF files, write a resulting PDF with highlighted changes License: GPL-2.0 Group: Productivity/Publishing/PDF Url: https://github.com/jnweiger/pdfcompare Source: pdfcompare-%version.tar.bz2 +Source100: pull_github.sh # These BuildRequires are only required for the testsuite BuildRequires: poppler-tools @@ -75,6 +76,7 @@ %endif %install +test -f pdfcompare.py || mv pdf_highlight.py pdfcompare.py install -Dm 0755 pdfcompare.py %{buildroot}%{_bindir}/pdfcompare %files ++++++ debian.changelog ++++++ pdfcompare (1.6.8-1) stable; urgency=low * V1.6.8 - cleaner popup annotations unless -S * no navigation buttons unless -f ..N.. -- Jürgen Weigert <[email protected]> Tue, 19 Apr 2016 15:45:49 +0200 pdfcompare (1.6.7-1ubuntu1) stable; urgency=medium * 1.6.7 support ubuntu 14.04 -- Jürgen Weigert <[email protected]> Mon, 18 Apr 2016 20:17:01 +0200 pdfcompare (1.6.6-1) stable; urgency=low * V1.6.6 - hunspell usage hint: how to add words to private dictonary. * pull_github.sh added. -- Jürgen Weigert <[email protected]> Mon, 18 Apr 2016 13:55:48 +0200 pdfcompare (1.6.5-2) stable; urgency=medium * Dependencies added, debian.rules file fixed. -- Jürgen Weigert <jw@jw-ThinkPad-T440s> Mon, 18 Apr 2016 13:13:31 +0200 pdfcompare (1.6.5-1) stable; urgency=medium * first DEB packaging -- Jürgen Weigert <jw@jw-ThinkPad-T440s> Mon, 04 Apr 2016 20:40:05 +0200 ++++++ debian.compat ++++++ 9 ++++++ debian.control ++++++ Source: pdfcompare Section: unknown Priority: optional Maintainer: Jürgen Weigert <[email protected]> Build-Depends: debhelper (>= 4.2.21) Package: pdfcompare Architecture: all # poppler-utils is needed for /usr/bin/pdftohtml # python-pygame is needed for pygame.font only. Depends: ${shlibs:Depends}, ${misc:Depends}, python-pypdf, python-pygame, poppler-utils, python-reportlab, hunspell, hunspell-en-us, hunspell-de-de Description: Compare two PDF files, write a resulting PDF with highlighted changes. Potential text portions that were moved around are recognized and analyzed for similarity with a second level diff. ++++++ debian.pdfcompare.install ++++++ pdfcompare /usr/bin ++++++ debian.rules ++++++ #!/usr/bin/make -f %: echo 'all:' > Makefile # don't build anything dh $@ override_dh_auto_install: mv pdfcompare.py pdfcompare || mv pdf_highlight.py pdfcompare chmod 0755 pdfcompare dh_auto_install -- INSTALL_ROOT=$(CURDIR)/debian/tmp ++++++ pdfcompare-1.6.5.tar.bz2 -> pdfcompare-1.6.8.tar.bz2 ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/MANIFEST.in new/pdfcompare-1.6.8/MANIFEST.in --- old/pdfcompare-1.6.5/MANIFEST.in 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/MANIFEST.in 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1 @@ +include *.txt diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/Makefile new/pdfcompare-1.6.8/Makefile --- old/pdfcompare-1.6.5/Makefile 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/Makefile 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,26 @@ + +VER=1.6.8 +D=dist/pdfcompare-$(VER) +EXCL=--exclude \*.orig --exclude \*~ + +all: check tar + +check test: + cd test; make test VER=$(VER) + +testrefresh refreshtest: + cd test; make test refresh=yes + +clean: + rm -rf dist *.orig *~ + rm -rf test/*.orig test/*~ + +tar dist: + rm -rf dist + mkdir -p $D + ln -s ../../pdfcompare.py $D/pdfcompare.py + ln -s ../../COPYING $D/ + ln -s ../../test $D/test + cd dist; tar jhcvf ../pdfcompare-$(VER).tar.bz2 pdfcompare-$(VER) $(EXCL) + rm -rf dist + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/README.txt new/pdfcompare-1.6.8/README.txt --- old/pdfcompare-1.6.5/README.txt 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/README.txt 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,13 @@ +pdfcompare +========== + +Compare text of two PDF files, write a resulting PDF with highlighted changes. +Potential text portions that were moved around are recognized and analyzed +for similarity with a second level diff. + +Required Packages: + +* pyPdf +* reportlab.pdfgen +* reportlab.lib.colors +* pygame.font' diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/TODO.md new/pdfcompare-1.6.8/TODO.md --- old/pdfcompare-1.6.5/TODO.md 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/TODO.md 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,81 @@ +TODO + +* Test with pyPDF2 + +* Test with python3 + +* Is a Windows installer possible? + +* Test popups with Microsoft Edge Browser + +* hunspell issues: + - python-HunspellPure should be a separate module. Split it. + - we artificially limit to [A-Z_-]+ for words. This is bad for german umlauts. + - extend hunspell to allow a progress indicator callback. + (counting newlines seen in response) + +* testsuite + - maybe prepare a test script that allows numbers to be off by some + percentage, but wants everything else precise. + This helps with pdf source checking. + +* improve --log logfile generator. + produce a json/xml/csv/txt file describing the diffs, -s word locations + and --spellcheck results. + +* one letter changes always become word changes. + Either run in single character mode. Or try to trim the replaced text for + common suffix or common prefix. + +* Normalize nonbreaking spaces to spaces. + This is important when e.g. markdown source has a 0x20 space, but rendered + PDF may have instead. + + + +DONE: + +* write compressed streams. + +* catch file open errors, before ET complains about 0 elements. + +* perform only same-length-replace. All other replace-ops should be replace+insert + or replace+delete. + +* place delete marker at last text end position, rather than next text start position. + This is a tricky, implementation in markword(). + +* testsuite + - a 1:1 comparison is not possible, as e.g. poppler-0.18 and poppler-0.20 + produce differences in the exact coordinates used. + - make a fuzzy comparison against templates with python-cv, pHash, etc... + http://stackoverflow.com/questions/1819124/image-comparison-algorithm suggests + Scipy. imgcmp.py does this. + - generate several output.pdf, convert via ImageMagick to png, + - run pdfcompare --version. + +* nicer +++---~~~== git style diagnostics per page, rather than saying '87 hits'. + +* if pagebreaks are within deleted text, point this out in the baloon popup. + +* Navigation from changebar to changebar, if there are many unchanged pages to jump over. + - calculation, graphics done. Hack with relocated navigation done. + +* popups are all in one line in okular. Need to provide linebreaks manually, sigh. + +* introduce an ignore-margin for text changes. Any words there will not go into + the compare wordlists, and will not match with --search. This is meant to skip + over pagenumbers and other bottom or top matter, that is not considered part + of the document contents stream. + --feature margin shall draw the margin area as shaded gray, so that we know + where we are. + +* feature: + pipe the wordlist through hunspell, if hunspell is available. + use search-highlights to mark all words for which hunspell has spelling + suggestions. + +* feature: + added a trivial --log implementation + +* second level diff for moved blocks. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/hunspell.py new/pdfcompare-1.6.8/hunspell.py --- old/pdfcompare-1.6.5/hunspell.py 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/hunspell.py 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,184 @@ +# hunspell.py -- a wrapper class for hunspell +# +# (c) 2013 Juergen Weigert [email protected] +# Distribute under GPL-2.0 or ask +# +# 2013-01-31, V0.1 jw - initial draught: word by word I/O +# 2013-02-01, V0.1 jw - added own _readline() to use buffering. Pythons readline() +# does single byte read()s, which is slow. +# 2013-02-02, V0.2 jw - check_words() now remembers a wordlist, pushes all out +# with an extra thread, reads back async, and reassembles. +# This is much more efficient +# +import os,subprocess,re + +__VERSION__ = '0.2' + +class Hunspell(): + """A pure python module to interface with hunspell. + It was written as a replacement for the hunspell module from + http://code.google.com/p/pyhunspell/, which appears to be in unmaintained. + and more difficult to use, due to lack of examples and documentation. + """ + def __init__(self, dicts=['en_US']): + self.cmd = ['hunspell', '-i', 'utf-8', '-a'] + self.dicts = dicts + self.proc = None + self.attr = None + self.buffer = '' + + def _start(self): + cmd = self.cmd + if self.dicts is not None and len(self.dicts): + cmd += ['-d', ','.join(self.dicts)] + try: + self.proc = subprocess.Popen(cmd, shell=False, + stdin=subprocess.PIPE, stdout=subprocess.PIPE) + except OSError as e: + self.proc = "%s failed: errno=%d %s" % (cmd, e.errno, e.strerror) + raise OSError(self.proc) + header = '' + while True: + more = self.proc.stdout.readline().rstrip() + if len(more) > 5 and more[0:5] == '@(#) ': # version line with -a + self.version = more[5:] + break + elif len(more) > 9 and more[0:9] == 'Hunspell ': # version line w/o -a + self.version = more + break + else: + header += more # stderr should be collected here. It does not work + if len(header): self.header = header + self.buffer = '' + + def _readline(self): + # python readline() is horribly stupid on this pipe. It reads single + # byte, just like java did in the 1980ies. Sorry, this is not + # acceptable in 2013. + if self.proc is None: + raise Error("Hunspell_readline before _start") + while True: + idx = self.buffer.find('\n') + if idx < 0: + more = self.proc.stdout.read() + if not len(more): + r = self.buffer + self.buffer = '' + return r + self.buffer += more + else: + break + r = self.buffer[0:idx+1] + self.buffer = self.buffer[idx+1:] + return r + + def _load_attr(self): + try: + p = subprocess.Popen(self.cmd + ['-D'], shell=False, + stdin=open('/dev/null'), stderr=subprocess.STDOUT, stdout=subprocess.PIPE) + except OSError as e: + raise OSError("%s failed: errno=%d %s" % (self.cmd + ['-D'], e.errno, e.strerror)) + self.attr = {} + header='' + while True: + line = p.stdout.readline().rstrip() + if not len(line): + break + # AVAILABLE DICTIONARIES (path is not mandatory for -d option): + m = re.match('([A-Z]+\s[A-Z]+).*:$', line) + if m: + header = m.group(1) + self.attr[header] = [] + elif len(header): + self.attr[header].append(line) + return self.attr + + def dicts(self,dicts=None): + """returns or sets the dictionaries that hunspell shall try to use""" + if dicts is not None: + self.dicts = dicts + return self.dicts + + def list_dicts(self): + """query hunspell about the available dictionaries. + Returns a key value dict where keys are short names, and values + are path names. You can pick some or all of the returned keys, + and use the list (or one) as an argument to + the next Hunspell() instance, or as an argument + to the dicts() method. + """ + if self.attr is None: self._load_attr() + r = {} + for d in self.attr['AVAILABLE DICTIONARIES']: + words = d.split('/') + r[words[-1]] = d + return r + + def dict_search_path(self): + """returns a list of pathnames, actually used by hunspell to load + spelling dictionaries from. + """ + if self.attr is None: self._load_attr() + r = [] + for d in self.attr['SEARCH PATH']: + r += d.split(':') + return r + + def dicts_loaded(self): + """query the spelling dictionaries that will actually be used for + the next check_words() call. + """ + if self.attr is None: self._load_attr() + return self.attr['LOADED DICTIONARY'] + + def check_words(self, words): + """takes a list of words as parameter, and checks them against the + loaded spelling dictionaries. A key value dict is returned, where + every key represents a word that was not found in the + spelling dictionaries. Values are lists of correction suggestions. + check_words() is implemented by calling the hunspell binary in pipe mode. + This is fairly robust, but not optimized for efficiency. + """ + if self.proc is None: + self._start() + childpid = os.fork() + if childpid == 0: + for w in words: + self.proc.stdin.write(("^"+w+"\n").encode('utf8')) + os._exit(0) + self.proc.stdin.close() + bad_words = {} + + while True: + line = self._readline() + if len(line) == 0: + break + line = line.rstrip() + if not len(line) or line[0] in '*+-': continue + + if line[0] == '#': + car = line.split(' ') + bad_words[car[1]] = [] # no suggestions + elif line[0] != '&': + print "hunspell protocoll error: '%s'" % line + continue # unknown stuff + # '& Radae 7 0: Radar, Ramada, Estrada, Prada, Rad, Roadie, Readable\n' + a = line.split(': ') + if len(a) >= 2: + car = a[0].split(' ') + cdr = a[1].split(', ') + bad_words[car[1]] = cdr + else: + print("bad hunspell reply: %s, split as %s" % (line, a)) + self.proc = None + return bad_words + + +if __name__ == "__main__": + from pprint import pprint + h = Hunspell() + pprint(h.list_dicts()) + pprint(h.dict_search_path()) + pprint(h.check_words(["ppppp", '123', '', 'gorkicht', 'gemank', 'haus', ''])) + pprint(h.check_words(["Radae", 'blood', 'mensch', 'green', 'blea', 'fork'])) + pprint(h.version) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/imgcmp.py new/pdfcompare-1.6.8/imgcmp.py --- old/pdfcompare-1.6.5/imgcmp.py 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/imgcmp.py 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,109 @@ +#! /usr/bin/python +# -*- coding: utf-8 -*- +# +# compare two images using Scipy +# (c) 2013 - [email protected] - distributer under GPL-2.0 or ask. +# +# Dependencies: +# sudo zypper in python-scipy +# sudo zypper in ImageMagick +# +# See also python-pHash, and python-opencv +# http://stackoverflow.com/questions/13379909/compare-similarity-of-images-using-opencv-with-python + +from __future__ import print_function, division + +import sys, os, re, tempfile +from pprint import pprint + +import scipy as sp +from scipy.misc import imread +from scipy.signal.signaltools import correlate2d as c2d + + +class CompareImageException(Exception): + """ + Exception class for comparing two files + """ + def __init__(self, c11, c12, c22): + self.c11=c11 + self.c12=c12 + self.c22=c22 + def __repr__(self): + return "(%.2f %.2f %.2f)" % (self.c11, self.c12, self.c22) + __str__=__repr__ + + +def load_img(fname): + """ + Load and convert images + """ + # get JPG image as Scipy array, RGB (3 layer) + if re.search("\.pdf$", fname, re.I): + # convert PDF to JPG + tf = tempfile.NamedTemporaryFile(delete=True, suffix=".jpg") + print("creating %s" % tf.name) + os.system("convert '%s[0]' -geometry 100x100 '%s'" % (fname, tf.name)) + data = imread(tf.name) + tf.close() + else: + data = imread(fname) + # convert to grey-scale using W3C luminance calc + ## pprint([data]) + ## ValueError: matrices are not aligned, if alpha channel... + lum = [299, 587, 114] + if len(data[0][0]) > 3: + lum.append(0) + data = sp.inner(data, lum) / 1000.0 + # normalize per http://en.wikipedia.org/wiki/Cross-correlation + return (data - data.mean()) / data.std() + + +def compare(file1, file2, diff): + """ + Compares two files (JPEG, PNG or PDF) + """ + im1 = load_img(file1) + im2 = load_img(file2) + c11 = c2d(im1, im1, mode='same') # baseline + c22 = c2d(im2, im2, mode='same') # baseline + c12 = c2d(im1, im2, mode='same') + m = [c11.max(), c12.max(), c22.max()] + diff_ab = 100 * (1-m[1]/m[0]) + diff_ba = 100 * (1-m[1]/m[2]) + + fail=max(diff_ab,diff_ba) > diff + + if fail: + raise CompareImageException(c11.max(), c12.max(), c22.max()) + + return fail + +def main(): + """ + Compares two files (JPEG, PNG or PDF) + """ + if len(sys.argv) < 4: + print("""Usage: %s FILE1 FILE2 N.NN + + FILE1,FILE2 can be in JPEG, PNG, or PDF format. + N.NN should be a small floating point number. It represents + the allowed difference in the image metrics. + correlate2d from scipy.signal.signaltools is used to compute + the metrics. + """ % sys.argv[0]) + sys.exit(0) + diff_allowed=float(sys.argv[3]) + try: + fail=compare(sys.argv[1],sys.argv[2],diff_allowed) + except CompareImageException as i: + print("error: %s" % i) + + + print("limit: %.2f%% -> %s" % (diff_allowed, ("OK","FAIL")[fail])) + if fail: sys.exit(1) + + +if __name__ == "__main__": + main() + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/man/Makefile new/pdfcompare-1.6.8/man/Makefile --- old/pdfcompare-1.6.5/man/Makefile 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/man/Makefile 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,20 @@ +# +# apt-get install xsltproc fop +# + +## openSUSE: +DB=/usr/share/xml/docbook/stylesheet/nwalsh/current/ +## Ubuntu: +DB=/usr/share/xml/docbook/stylesheet/nwalsh/ + +all: man html pdf + +man: + xsltproc $(DB)/manpages/docbook.xsl pdfcompare.xml + +html: + xsltproc --output pdfcompare.html $(DB)/xhtml/docbook.xsl pdfcompare.xml + +pdf: + xsltproc --stringparam paper.type A4 --output pdfcompare.fo $(DB)/fo/docbook.xsl pdfcompare.xml + fop pdfcompare.fo pdfcompare.pdf && rm pdfcompare.fo diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/man/pdfcompare.1 new/pdfcompare-1.6.8/man/pdfcompare.1 --- old/pdfcompare-1.6.5/man/pdfcompare.1 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/man/pdfcompare.1 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,189 @@ +'\" t +.\" Title: pdfcompare +.\" Author: Jürgen Weigert +.\" Generator: DocBook XSL Stylesheets v1.78.1 <http://docbook.sf.net/> +.\" Date: 04/18/2016 +.\" Manual: @VERSION@ +.\" Source: https://github.com/jnweiger/pdfcompare @VERSION@ +.\" Language: English +.\" +.TH "PDFCOMPARE" "1" "04/18/2016" "https://github\&.com/jnweiger/" "@VERSION@" +.\" ----------------------------------------------------------------- +.\" * Define some portability stuff +.\" ----------------------------------------------------------------- +.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.\" http://bugs.debian.org/507673 +.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html +.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" ----------------------------------------------------------------- +.\" * set default formatting +.\" ----------------------------------------------------------------- +.\" disable hyphenation +.nh +.\" disable justification (adjust text to left margin only) +.ad l +.\" ----------------------------------------------------------------- +.\" * MAIN CONTENT STARTS HERE * +.\" ----------------------------------------------------------------- +.SH "NAME" +pdfcompare \- Highlight words in a PDF file +.SH "SYNOPSIS" +.HP \w'\fBpdfcompare\fR\ 'u +\fBpdfcompare\fR [\-h] [\-c\ \fIOLDFILE\fR] [\-d\ \fIDECRYPT_KEY\fR] [\-e] [\-i] [\-l\ \fILOGFILE\fR] [\-m\ \fIOPS\fR] [\-n] [\-o\ \fIOUTFILE\fR] [\-s\ \fIWORD_REGEXP\fR] [\-\-spell] [\-\-strict] [\-t\ \fITRANSP\fR] [\-B] [\-C\ NAME=\fIR\fR,\fIG\fR,\fIB\fR] [\-D] [\-F\ \fIFIRST_PAGE\fR] [\-L\ \fILAST_PAGE\fR] [\-M\ N,E,W,S] [\-V] [\-X] +.br +{INFILE} [INFILE2] +.SH "POSITIONAL ARGUMENTS" +.PP +\fBINFILE\fR +.RS 4 +the required PDF input file +.RE +.PP +\fBINFILE2\fR +.RS 4 +an optional +\(lqnewer\(rq +PDF input file; alternate syntax to +\fB\-c\fR +.RE +.SH "OPTIONAL ARGUMENTS" +.PP +\fB\-B\fR, \fB\-\-below\fR +.RS 4 +Paint the highlight markers below the text\&. Try this if the normal merge crashes\&. Use with care, highlights may disappear below background graphics\&. Default: BELOW=\*(AqFALSE\*(Aq +.RE +.PP +\fB\-c \fR\fB\fIOLDFILE\fR\fR, \fB\-\-compare\-text \fR\fB\fIOLDFILE\fR\fR +.RS 4 +Mark added, deleted and replaced text (or see +\fB\-m\fR) with regard to +\fIOLDFILE\fR\&. File formats +\&.pdf, +\&.xml, +\&.txt +are recognized by their suffix\&. The comparison works word by word\&. +.RE +.PP +\fB\-C NAME=\fR\fB\fIR\fR\fR\fB,\fR\fB\fIG\fR\fR\fB,\fR\fB\fIB\fR\fR, \fB\-\-search\-color NAME=\fR\fB\fIR\fR\fR\fB,\fR\fB\fIG\fR\fR\fB,\fR\fB\fIB\fR\fR +.RS 4 +Set colors of the search highlights as an RGB triplet; R,G,B ranges are 0\&.0\-1\&.0 each; valid names are \*(Aqadd,\*(Aqdelete\*(Aq,\*(Aqchange\*(Aq,\*(Aqequal\*(Aq,\*(Aqmargin\*(Aq,\*(Aqall\*(Aq; default name is \*(Aqequal\*(Aq, which is also used for +\fB\-s\fR; default colors are A=0\&.3,1,0\&.3 /*green*/ C=0\&.9,0\&.8,0 /*yellow*/ B=0\&.9,0\&.9,0\&.9 /*gray*/ E=1,0,1 /*pink*/ D=1,0\&.3,0\&.3 /*red*/ M=0\&.7,1,1 /*blue*/ +.RE +.PP +\fB\-D\fR, \fB\-\-debug\fR +.RS 4 +Enable debugging\&. Prints more on stdout, dumps several +*\&.xml +and +*\&.pdf +files\&. +.RE +.PP +\fB\-e\fR, \fB\-\-exclude\-irrelevant\-pages\fR +.RS 4 +With +\fB\-s\fR; show only matching pages\&. With +\fB\-c\fR: show only changed pages; default: reproduce all pages from +\fIINFILE\fR +in +\fIOUTFILE\fR +.RE +.PP +\fB\-f \fR\fB\fIFEATURES\fR\fR, \fB\-\-features \fR\fB\fIFEATURES\fR\fR +.RS 4 +Specify how to mark\&. Allowed values are \*(Aqhighlight\*(Aq, \*(Aqchangebar\*(Aq, \*(Aqpopup\*(Aq, \*(Aqnavigation\*(Aq, \*(Aqwatermark\*(Aq, \*(Aqmargin\*(Aq\&. Default: H,C,P,N,W,B +.RE +.PP +\fB\-F \fR\fB\fIFIRST_PAGE\fR\fR, \fB\-\-first\-page \fR\fB\fIFIRST_PAGE\fR\fR +.RS 4 +Skip some pages at start of document; see also +\fB\-L\fR; default: all pages +.RE +.PP +\fB\-h\fR, \fB\-\-help\fR +.RS 4 +Show this help message and exit +.RE +.PP +\fB\-i\fR, \fB\-\-nocase\fR +.RS 4 +Make +\fB\-s\fR +case insensitive; default: case sensitive +.RE +.PP +\fB\-L \fR\fB\fILAST_PAGE\fR\fR, \fB\-\-last\-page \fR\fB\fILAST_PAGE\fR\fR +.RS 4 +Limit pages processed; this counts pages, it does not use document page numbers; see also +\fB\-F\fR; default: all pages +.RE +.PP +\fB\-l \fR\fB\fILOGFILE\fR\fR, \fB\-\-log \fR\fB\fILOGFILE\fR\fR +.RS 4 +Write an python datastructure describing all the overlay objects on each page\&. Default none\&. +.RE +.PP +\fB\-M N,E,W,S\fR, \fB\-\-margins N,E,W,S\fR +.RS 4 +Specify margin space to ignore on each page\&. A margin width is expressed in units of ca\&. 100dpi\&. Specify four numbers in the order north,east,west,south\&. Default: 0,0,0,0 +.RE +.PP +\fB\-m \fR\fB\fIOPS\fR\fR, \fB\-\-mark \fR\fB\fIOPS\fR\fR +.RS 4 +Specify what to mark\&. Used with +\fB\-c\fR\&. Allowed values are \*(Aqadd\*(Aq,\*(Aqdelete\*(Aq,\*(Aqchange\*(Aq,\*(Aqequal\*(Aq\&. Multiple values can be listed comma\-seperated; abbreviations are allowed\&. Default: A,D,C +.RE +.PP +\fB\-n\fR, \fB\-\-no\-output\fR +.RS 4 +Do not write an output file; print diagnostics only; default: write output file as per +\fB\-o\fR +.RE +.PP +\fB\-o \fR\fB\fIOUTFILE\fR\fR, \fB\-\-output \fR\fB\fIOUTFILE\fR\fR +.RS 4 +Write output to FILE; default: +output\&.pdf +.RE +.PP +\fB\-\-spell\fR, \fB\-\-spell\-check\fR +.RS 4 +Run the text body of the (new) PDF through +\fBhunspell\fR\&. Unknown words are underlined\&. Use e\&.g\&. \*(Aqenv DICTIONARY=en_US \&.\&.\&.\*(Aq (or de_DE, \&.\&.\&.) to specify the spelling dictionary, if your system has more than one\&. To add new words to your private dictionary use e\&.g\&. \*(Aqecho "ownCloud" >> ~/\&.hunspell_en_US\*(Aq Check with +\fBhunspell \fR\fB\fB\-D\fR\fR +and study +\fBhunspell\fR(1)\&. +.RE +.PP +\fB\-\-strict\fR +.RS 4 +Show really all differences; default: ignore removed hyphenation; ignore character spacing inside a word +.RE +.PP +\fB\-t \fR\fB\fITRANSP\fR\fR, \fB\-\-transparency \fR\fB\fITRANSP\fR\fR +.RS 4 +Set transparency of the highlight; invisible: 0\&.0; full opaque: 1\&.0; default: 0\&.6 +.RE +.PP +\fB\-V\fR, \fB\-\-version\fR +.RS 4 +Print the version number and exit +.RE +.PP +\fB\-X\fR, \fB\-\-no\-compression\fR +.RS 4 +Write uncompressed PDF\&. Default: FlateEncode filter compression\&. +.RE +.SH "AUTHORS" +.PP +\fBJürgen Weigert\fR +.RS 4 +Developer +.RE +.PP +\fBThomas Schraitle\fR <\&toms@opensuse\&.org\&> +.RS 4 +Manpage author +.RE diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/man/pdfcompare.xml new/pdfcompare-1.6.8/man/pdfcompare.xml --- old/pdfcompare-1.6.5/man/pdfcompare.xml 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/man/pdfcompare.xml 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,272 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" + "http://www.docbook.org/xml/4.5/docbookx.dtd" +[ + <!ENTITY product "pdfcompare"> + <!ENTITY cmd "pdfcompare"> +]> +<refentry lang="en" id="pdfcompare"> + <refentryinfo> + <productname>&product;</productname> + <author> + <firstname>Jürgen</firstname> + <surname>Weigert</surname> + <contrib>Developer</contrib> + </author> + <othercredit class="technicaleditor"> + <firstname>Thomas</firstname> + <surname>Schraitle</surname> + <email>[email protected]</email> + <contrib>Manpage author</contrib> + </othercredit> + </refentryinfo> + <refmeta> + <refentrytitle>&cmd;</refentrytitle> + <manvolnum>1</manvolnum> + <refmiscinfo class="version">@VERSION@</refmiscinfo> + <refmiscinfo class="source">https://github.com/jnweiger/pdfcompare</refmiscinfo> + <!--<refmiscinfo class="manual"></refmiscinfo>--> + </refmeta> + + <refnamediv> + <refname>&product;</refname> + <refpurpose>Highlight words in a PDF file</refpurpose> + </refnamediv> + + <refsynopsisdiv id="calabash.synopsis"> + <title>Synopsis</title> + <cmdsynopsis><command>&cmd;</command> + <arg choice="opt">-h</arg> + <arg choice="opt">-c <replaceable>OLDFILE</replaceable></arg> + <arg choice="opt">-d <replaceable>DECRYPT_KEY</replaceable></arg> + <arg choice="opt">-e</arg> + <arg choice="opt">-i</arg> + <arg choice="opt">-l <replaceable>LOGFILE</replaceable></arg> + <arg choice="opt">-m <replaceable>OPS</replaceable></arg> + <arg choice="opt">-n</arg> + <arg choice="opt">-o <replaceable>OUTFILE</replaceable></arg> + <arg choice="opt">-s <replaceable>WORD_REGEXP</replaceable></arg> + <arg choice="opt">--spell</arg> + <arg choice="opt">--strict</arg> + <arg choice="opt">-t <replaceable>TRANSP</replaceable></arg> + <arg choice="opt">-B</arg> + <arg choice="opt">-C NAME=<replaceable>R</replaceable>,<replaceable>G</replaceable>,<replaceable>B</replaceable></arg> + <arg choice="opt">-D</arg> + <arg choice="opt">-F <replaceable>FIRST_PAGE</replaceable></arg> + <arg choice="opt">-L <replaceable>LAST_PAGE</replaceable></arg> + <arg choice="opt">-M N,E,W,S</arg> + <arg choice="opt">-V</arg> + <arg choice="opt">-X</arg> + <sbr/> + <arg choice="req">INFILE</arg> + <arg choice="opt">INFILE2</arg> + </cmdsynopsis> + </refsynopsisdiv> + + <refsect1> + <title>Positional Arguments</title> + <variablelist> + <varlistentry> + <term><option>INFILE</option></term> + <listitem> + <para>the required PDF input file</para> + </listitem> + </varlistentry> + <varlistentry> + <term><option>INFILE2</option></term> + <listitem> + <para>an optional <quote>newer</quote> PDF input file; + alternate syntax to <option>-c</option></para> + </listitem> + </varlistentry> + </variablelist> + </refsect1> + + <refsect1> + <title>Optional Arguments</title> + <variablelist> + <varlistentry id="pdfcompare.below"> + <term><option>-B</option></term> + <term><option>--below</option></term> + <listitem> + <para>Paint the highlight markers below the text. Try this if + the normal merge crashes. Use with care, highlights may + disappear below background graphics. Default: BELOW='FALSE'</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.compare-text"> + <term><option>-c <replaceable>OLDFILE</replaceable></option></term> + <term><option>--compare-text <replaceable>OLDFILE</replaceable></option></term> + <listitem> + <para>Mark added, deleted and replaced text (or see <option>-m</option>) with + regard to <replaceable>OLDFILE</replaceable>. File formats <filename>.pdf</filename>, + <filename>.xml</filename>, <filename>.txt</filename> are + recognized by their suffix. The comparison works word by + word.</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.search-color"> + <term><option>-C NAME=<replaceable>R</replaceable>,<replaceable>G</replaceable>,<replaceable>B</replaceable></option></term> + <term><option>--search-color NAME=<replaceable>R</replaceable>,<replaceable>G</replaceable>,<replaceable>B</replaceable></option></term> + <listitem> + <para>Set colors of the search highlights as an RGB triplet; + R,G,B ranges are 0.0-1.0 each; valid names are + 'add,'delete','change','equal','margin','all'; default name + is 'equal', which is also used for <option>-s</option>; default colors are + A=0.3,1,0.3 /*green*/ C=0.9,0.8,0 /*yellow*/ B=0.9,0.9,0.9 + /*gray*/ E=1,0,1 /*pink*/ D=1,0.3,0.3 /*red*/ M=0.7,1,1 + /*blue*/</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.debug"> + <term><option>-D</option></term> + <term><option>--debug</option></term> + <listitem> + <para>Enable debugging. Prints more on stdout, dumps several + <filename>*.xml</filename> and <filename>*.pdf</filename> + files.</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.exclude-irrelevant-pages"> + <term><option>-e</option></term> + <term><option>--exclude-irrelevant-pages</option></term> + <listitem> + <para>With <option>-s</option>; show only matching pages. With + <option>-c</option>: show only changed pages; default: + reproduce all pages from <replaceable>INFILE</replaceable> + in <replaceable>OUTFILE</replaceable></para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.features"> + <term><option>-f <replaceable>FEATURES</replaceable></option></term> + <term><option>--features <replaceable>FEATURES</replaceable></option></term> + <listitem> + <para>Specify how to mark. Allowed values are 'highlight', + 'changebar', 'popup', 'navigation', 'watermark', 'margin'. + Default: H,C,P,N,W,B</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.first-page"> + <term><option>-F <replaceable>FIRST_PAGE</replaceable></option></term> + <term><option>--first-page <replaceable>FIRST_PAGE</replaceable></option></term> + <listitem> + <para>Skip some pages at start of document; see also + <option>-L</option>; default: all pages</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.help"> + <term><option>-h</option></term> + <term><option>--help</option></term> + <listitem> + <para>Show this help message and exit</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.nocase"> + <term><option>-i</option></term> + <term><option>--nocase</option></term> + <listitem> + <para>Make <option>-s</option> case insensitive; default: case + sensitive</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.last-page"> + <term><option>-L <replaceable>LAST_PAGE</replaceable></option></term> + <term><option>--last-page <replaceable>LAST_PAGE</replaceable></option></term> + <listitem> + <para>Limit pages processed; this counts pages, it does not + use document page numbers; see also <option>-F</option>; default: all + pages</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.log"> + <term><option>-l <replaceable>LOGFILE</replaceable></option></term> + <term><option>--log <replaceable>LOGFILE</replaceable></option></term> + <listitem> + <para>Write an python datastructure describing all the overlay + objects on each page. Default none.</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.margins"> + <term><option>-M N,E,W,S</option></term> + <term><option>--margins N,E,W,S</option></term> + <listitem> + <para>Specify margin space to ignore on each page. A margin + width is expressed in units of ca. 100dpi. Specify four + numbers in the order north,east,west,south. Default: + 0,0,0,0</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.mark"> + <term><option>-m <replaceable>OPS</replaceable></option></term> + <term><option>--mark <replaceable>OPS</replaceable></option></term> + <listitem> + <para>Specify what to mark. Used with <option>-c</option>. Allowed values are + 'add','delete','change','equal'. Multiple values can be + listed comma-seperated; abbreviations are allowed. Default: + A,D,C</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.no-output"> + <term><option>-n</option></term> + <term><option>--no-output</option></term> + <listitem> + <para>Do not write an output file; print diagnostics only; + default: write output file as per <option>-o</option></para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.output"> + <term><option>-o <replaceable>OUTFILE</replaceable></option></term> + <term><option>--output <replaceable>OUTFILE</replaceable></option></term> + <listitem> + <para>Write output to FILE; default: <filename>output.pdf</filename></para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.spell"> + <term><option>--spell</option></term> + <term><option>--spell-check</option></term> + <listitem> + <para>Run the text body of the (new) PDF through <command>hunspell</command>. + Unknown words are underlined. Use e.g. 'env DICTIONARY=en_US + ...' (or de_DE, ...) to specify the spelling dictionary, if + your system has more than one. To add new words to your + private dictionary use e.g. 'echo "ownCloud" >> ~/.hunspell_en_US' + Check with <command>hunspell <option>-D</option></command> and + study <citerefentry> + <refentrytitle>hunspell</refentrytitle> + <manvolnum>1</manvolnum> + </citerefentry>. </para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.strict"> + <term><option>--strict</option></term> + <listitem> + <para>Show really all differences; default: ignore removed + hyphenation; ignore character spacing inside a word</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.transparency"> + <term><option>-t <replaceable>TRANSP</replaceable></option></term> + <term><option>--transparency <replaceable>TRANSP</replaceable></option></term> + <listitem> + <para>Set transparency of the highlight; invisible: 0.0; full + opaque: 1.0; default: 0.6 </para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.version"> + <term><option>-V</option></term> + <term><option>--version</option></term> + <listitem> + <para>Print the version number and exit</para> + </listitem> + </varlistentry> + <varlistentry id="pdfcompare.no-compression"> + <term><option>-X</option></term> + <term><option>--no-compression</option></term> + <listitem> + <para>Write uncompressed PDF. Default: FlateEncode filter + compression.</para> + </listitem> + </varlistentry> + </variablelist> + </refsect1> +</refentry> diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/pdfcompare.py new/pdfcompare-1.6.8/pdfcompare.py --- old/pdfcompare-1.6.5/pdfcompare.py 2014-01-07 15:28:01.000000000 +0100 +++ new/pdfcompare-1.6.8/pdfcompare.py 2016-04-19 17:23:08.000000000 +0200 @@ -1,9 +1,9 @@ #! /usr/bin/python # -*- coding: UTF-8 -*- # -# pdf_highlight.py -- command line tool to show search or compare results in a PDF +# pdfcompare.py -- command line tool to show search or compare results in a PDF # -# (c) 2012-2013 Juergen Weigert [email protected] +# (c) 2012-2016 Juergen Weigert [email protected] # Distribute under GPL-2.0 or ask # # 2012-03-16, V0.1 jw - initial draught: argparse, pdftohtml-xml, font.metrics @@ -88,6 +88,12 @@ # later on. Strange. # 2014-01-07, V1.6.5 jw - manually merged https://github.com/jnweiger/pdfcompare/pull/4 # hope, I did not break too much... +# 2014-11-07, V1.6.6 jw - hint added for hunspell use: add word. +# 2015-04-18, V1.6.7 jw - fall back to pyPdf from PyPDF2, for Ubuntu 14.04 LTS +# 2015-04-19, V1.6.8 jw - popup pN[tcb]: source location descriptors optional. +# No normal user expects or understands them. +# No navigation marks per default. They are often broken, and often +# useless due to page number changes. Include in -f to enable. # # osc in devel:languages:python python-pypdf >= 1.13+20130112 # need fix from https://bugs.launchpad.net/pypdf/+bug/242756 @@ -113,9 +119,9 @@ # Compatibility for older Python versions from __future__ import with_statement from __future__ import print_function -# from __future__ import division +from __future__ import division -__VERSION__ = '1.6.5' +__VERSION__ = '1.6.8' try: # python2 @@ -123,7 +129,12 @@ except ImportError: # python3, breaks python2-reportlab from io import StringIO -from pyPdf import PdfFileWriter, PdfFileReader, generic as Pdf +try: + # Ubuntu 15.x + from PyPDF2 import PdfFileWriter, PdfFileReader, generic as Pdf +except ImportError: + # Ubuntu 14.04 LTS + from pyPdf import PdfFileWriter, PdfFileReader, generic as Pdf from reportlab.pdfgen import canvas from reportlab.lib.colors import Color import urllib # used when normal encode fails. @@ -150,6 +161,7 @@ highlight_height = 1.2 # some fonts cause too much overlap with 1.4 # 1.2 is often not enough to look symmetric. +anno_popup_src_loc_ref = False # False: 'chg: bla' True: 'chg:p1t: bla' # from pdfminer.fontmetrics import FONT_METRICS # FONT_METRICS['Helvetica'][1]['W'] @@ -283,7 +295,10 @@ text = mark.get('t', '.') + ':' if 'o' in mark: if isinstance(mark['o'], list): - text += mark['o'][1]+': '+ mark['o'][0] + if anno_popup_src_loc_ref: + text += mark['o'][1]+': '+ mark['o'][0] + else: + text += ' '+mark['o'][0] else: text += ' '+mark['o'] # need ascii here. anything else triggers @@ -585,7 +600,7 @@ i = word[2] l = len(word[0]) - char_width = float(x2-x1)/len(word[1]) + char_width = (x2-x1)/len(word[1]) x1 += i * char_width x2 = x1 + l * char_width # Given the fast track above, maybe for the rest, a @@ -713,7 +728,7 @@ return finfo def main(): - parser = ArgumentParser(epilog="version: "+__VERSION__, description="highlight words in a PDF file.") + parser = ArgumentParser(epilog="version: "+__VERSION__, description="Highlight changed/added/deleted/moved text in a PDF file.") parser.def_trans = 0.6 parser.def_decrypt_key = '' parser.def_colors = { 'E': [1,0,1, 'pink'], # extra @@ -724,75 +739,87 @@ 'B': [.9,.9,.9, 'gray'] } # borders parser.def_output = 'output.pdf' parser.def_marks = 'A,D,C' - parser.def_features = 'H,C,P,N,W,B' + parser.def_features = 'H,C,P,W,B' parser.def_margins = '0,0,0,0' parser.def_margins = '0,0,0,0' parser.def_below = False parser.add_argument("-c", "--compare-text", metavar="OLDFILE", - help="mark added, deleted and replaced text (or see -m) with regard to OLDFILE. \ + help="Mark added, deleted and replaced text (or see -m) with regard to OLDFILE. \ File formats .pdf, .xml, .txt are recognized by their suffix. \ The comparison works word by word.") parser.add_argument("-d", "--decrypt-key", metavar="DECRYPT_KEY", default=parser.def_decrypt_key, - help="open an encrypted PDF; default: KEY='"+parser.def_decrypt_key+"'") + help="Open an encrypted PDF. Default: KEY='"+parser.def_decrypt_key+"'") parser.add_argument("-e", "--exclude-irrelevant-pages", default=False, action="store_true", - help="with -s: show only matching pages; with -c: show only changed pages; \ - default: reproduce all pages from INFILE in OUTFILE") + help="With -s: show only matching pages; with -c: show only changed pages. \ + Default: reproduce all pages from INFILE in OUTFILE.") parser.add_argument("-f", "--features", metavar="FEATURES", default=parser.def_features, - help="specify how to mark. Allowed values are 'highlight', 'changebar', 'popup', \ + help="Specify how to mark. Allowed values are 'highlight', 'changebar', 'popup', \ 'navigation', 'watermark', 'margin'. Default: " + str(parser.def_features)) parser.add_argument("-i", "--nocase", default=False, action="store_true", - help="make -s case insensitive; default: case sensitive") + help="Make -s case insensitive; default: case sensitive.") parser.add_argument("-l", "--log", metavar="LOGFILE", - help="write an python datastructure describing all the overlay objects on each page. Default none.") + help="Write an python datastructure describing all the overlay objects on each page. Default none.") parser.add_argument("-m", "--mark", metavar="OPS", default=parser.def_marks, - help="specify what to mark. Used with -c. Allowed values are 'add','delete','change','equal'. \ + help="Specify what to mark. Used with -c. Allowed values are 'add','delete','change','equal'. \ Multiple values can be listed comma-seperated; abbreviations are allowed.\ Default: " + str(parser.def_marks)) parser.add_argument("-n", "--no-output", default=False, action="store_true", - help="do not write an output file; print diagnostics only; default: write output file as per -o") + help="Do not write an output file; print diagnostics only. Default: write output file as per -o option.") parser.add_argument("-o", "--output", metavar="OUTFILE", default=parser.def_output, - help="write output to FILE; default: "+parser.def_output) + help="Write output to FILE; default: "+parser.def_output) parser.add_argument("-s", "--search", metavar="WORD_REGEXP", - help="highlight WORD_REGEXP") + help="Highlight WORD_REGEXP") parser.add_argument("--spell", "--spell-check", default=False, action="store_true", - help="run the text body of the (new) pdf through hunspell. Unknown words are underlined. Use e.g. 'env DICTIONARY=de_DE ...' (or en_US, ...) to specify the spelling dictionary, if your system has more than one. Check with 'hunspell -D' and study 'man hunspell'.") + help="Run the text body of the (new) pdf through hunspell. Unknown words are underlined. \ + Use e.g. 'env DICTIONARY=en_US ...' (or de_DE, ...) to specify the spelling dictionary, \ + if your system has more than one. To add new words to your private dictionary use e.g. \ + 'echo >> ~/.hunspell_en_US ownCloud'. Check with 'hunspell -D' and study 'man hunspell'.") parser.add_argument("--strict", default=False, action="store_true", - help="show really all differences; default: ignore removed hyphenation; ignore character spacing inside a word") + help="Show really all differences. Default: ignore removed hyphenation; \ + ignore character spacing inside a word.") parser.add_argument("-t", "--transparency", type=float, default=parser.def_trans, metavar="TRANSP", - help="set transparency of the highlight; invisible: 0.0; full opaque: 1.0; \ + help="Set transparency of the highlight; invisible: 0.0; full opaque: 1.0; \ default: " + str(parser.def_trans)) parser.add_argument("-B", "--below", default=parser.def_below, action="store_true", - help="Paint the highlight markers below the text. Try this if the normal merge crashes. Use with care, highlights may disappear below background graphics. Default: BELOW='"+str(parser.def_below)+"'") + help="Paint the highlight markers below the text. Try this if the normal merge crashes. Use with care, highlights may disappear below background graphics. Default: BELOW='"+str(parser.def_below)+"'.") parser.add_argument("-C", "--search-color", metavar="NAME=R,G,B", action="append", - help="set colors of the search highlights as an RGB triplet; R,G,B ranges are 0.0-1.0 each; valid names are 'add,'delete','change','equal','margin','all'; default name is 'equal', which is also used for -s; default colors are " + + help="Set colors of the search highlights as an RGB triplet; R,G,B ranges are 0.0-1.0 each; valid names are 'add,'delete','change','equal','margin','all'; default name is 'equal', which is also used for -s; default colors are " + " ".join(["%s=%s,%s,%s /*%s*/ " %(x_y[0],x_y[1][0],x_y[1][1],x_y[1][2],x_y[1][3]) for x_y in list(parser.def_colors.items())])) parser.add_argument("-D", "--debug", default=False, action="store_true", - help="enable debugging. Prints more on stdout, dumps several *.xml or *.pdf files.") + help="Enable debugging. Prints more on stdout, dumps several *.xml or *.pdf files.") parser.add_argument("-F", "--first-page", metavar="FIRST_PAGE", - help="skip some pages at start of document; see also -L; default: all pages") + help="Skip some pages at start of document; see also -L option. Default: all pages.") parser.add_argument("-L", "--last-page", metavar="LAST_PAGE", - help="limit pages processed; this counts pages, it does not use document \ - page numbers; see also -F; default: all pages") + help="Limit pages processed; this counts pages, it does not use document \ + page numbers; see also -F; default: all pages.") parser.add_argument("-M", "--margins", metavar="N,E,W,S", default=parser.def_margins, - help="specify margin space to ignore on each page. A margin width is expressed \ + help="Specify margin space to ignore on each page. A margin width is expressed \ in units of ca. 100dpi. Specify four numbers in the order north,east,west,south. Default: "\ + str(parser.def_margins)) + parser.add_argument("-S", "--source-location", default=False, action="store_true", + help="Annotation start includes :pNX: markers where 'N' is the page number of the location \ + in the original document and X is 't' for top, 'c' for center, or 'b' for bottom of the page. \ + Default: Annotations start only with 'chg:', 'add:', 'del:' optionally followed by original text.") parser.add_argument("-V", "--version", default=False, action="store_true", - help="print the version number and exit") + help="Print the version number and exit.") parser.add_argument("-X", "--no-compression", default=False, action="store_true", - help="write uncompressed PDF. Default: FlateEncode filter compression.") + help="Write uncompressed PDF. Default: FlateEncode filter compression.") parser.add_argument("--leftside", default=False, action="store_true", - help="put changebars and navigation at the left hand side of the page. Default: right hand side.") - parser.add_argument("infile", metavar="INFILE", help="the input file") - parser.add_argument("infile2", metavar="INFILE2", nargs="?", help="optional 'newer' input file; alternate syntax to -c") + help="Put changebars and navigation at the left hand side of the page. Default: right hand side.") + parser.add_argument("infile", metavar="INFILE", help="The input file.") + parser.add_argument("infile2", metavar="INFILE2", nargs="?", help="Optional 'newer' input file; alternate syntax to -c") args = parser.parse_args() # --help is automatic args.transparency = 1 - args.transparency # it is needed reversed. if args.version: parser.exit(__VERSION__) + global debug debug = args.debug + global anno_popup_src_loc_ref + anno_popup_src_loc_ref = args.source_location + args.search_colors = parser.def_colors.copy() if args.search_color: for col in args.search_color: @@ -821,7 +848,7 @@ args.compare_text,args.infile = args.infile,args.infile2 if args.search is None and args.compare_text is None and args.spell is None: - parser.exit("Oops. Nothing to do. Specify either -s or --spell or -c or two input files") + parser.exit("Oops. Nothing to do. Specify either -s or --spell or -c or two input files.") if not os.access(args.infile, os.R_OK): parser.exit("Cannot read input file: %s" % args.infile) @@ -926,13 +953,14 @@ print("DocumentInfo():") pprint(di) output._objects.append(di) - except Exception,e: + except Exception as e: print("WARNING: getDocumentInfo() failed: " + str(e) ) output._info = Pdf.IndirectObject(len(output._objects), 0, output) pages_written = 0 total_hits = 0 + outline = [] page_idx = 0 nav_bwd = None @@ -961,6 +989,7 @@ if hitdetails[det]: hits_fmt += '%s%d' % (ch,hitdetails[det]) print(" page %d: %d hits %s" % (page_marks[i]['nr'], len(page_marks[i]['rect']), hits_fmt)) + outline.append(" page %d: %d hits %s" % (page_marks[i]['nr'], len(page_marks[i]['rect']), hits_fmt)) # pprint(hitdetails) page = input1.getPage(i) @@ -1009,8 +1038,15 @@ output.addPage(page) pages_written += 1 - print("saving %s" % args.output) + # add outline + try: + parent = output.addBookmark('Hits', 0) # add parent bookmark + for bm in outline: + output.addBookmark(bm,outline.index(bm),parent=parent) + except Exception as e: + print("Warning: cannot add Bookmarks (pyPdf too old?): %s" % str(e)) + if args.no_output is False: outputStream = file(args.output, "wb") try: @@ -1068,7 +1104,7 @@ if (width is not None): tot_w = pre_w+str_w+suf_w if (tot_w == 0): tot_w = 1 - ratio = float(width)/tot_w + ratio = width/tot_w #pprint([[pre,str,suf,width],[pre_w,str_w,suf_w,tot_w],ratio]) return (xoff+pre_w*ratio, str_w*ratio) @@ -1159,9 +1195,9 @@ def catwords(dw, idx1, idx2, maxwords=666): # make maxwords low enough, so that the popup fits on the screen. if (maxwords is not None and idx2-idx1 > maxwords): - cw1_text, cw1_loc = catwords(dw, idx1, idx1+maxwords/3, None) - cw2_text, cw2_loc = catwords(dw, idx2-maxwords/3, idx2, None) - text = cw1_text + ("<br><br> --]-------- snip %d words --------[-- <br><br>" % (idx2-idx1-maxwords*2/3)) + cw2_text + cw1_text, cw1_loc = catwords(dw, idx1, idx1+int(maxwords/3), None) + cw2_text, cw2_loc = catwords(dw, idx2-int(maxwords/3), idx2, None) + text = cw1_text + ("<br><br> --]-------- snip %d words --------[-- <br><br>" % (idx2-idx1-int(maxwords*2/3))) + cw2_text return [ text, cw1_loc ] text = "" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/runtests.py new/pdfcompare-1.6.8/runtests.py --- old/pdfcompare-1.6.5/runtests.py 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/runtests.py 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,16 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- + +import pytest +import sys + +class MyPlugin: + def pytest_sessionfinish(self): + print("\n*** test run reporting finishing") + + +# Empty statement here needed so minversion reports no error +#pytest + +pytest.main(plugins=[MyPlugin()] ) + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/setup.py new/pdfcompare-1.6.8/setup.py --- old/pdfcompare-1.6.5/setup.py 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/setup.py 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,40 @@ +#!/usr/bin/env python +# -*- coding: utf-8 -*- +import sys + +from distutils.core import setup +from setuptools.command.test import test as TestCommand + +class PyTest(TestCommand): + def finalize_options(self): + TestCommand.finalize_options(self) + self.test_args = [] + self.test_suite = True + def run_tests(self): + #import here, cause outside the eggs aren't loaded + import pytest + errno = pytest.main(self.test_args) + sys.exit(errno) + + +setup(name='pdfcompare', + version='1.0', + description='Compare two PDF files', + author='Jürgen Weigert', + author_email='[email protected]', + url='https://github.com/jnweiger/pdfcompare', + scripts=['pdfcompare.py', 'imgcmp.py'], + license='GPL-2.0', + classifiers=[ + 'License :: OSI Approved :: GNU General Public License v2 (GPLv2)', + 'Environment :: Console', + 'Development Status :: 5 - Production/Stable', + 'Programming Language :: Python :: 2.7', + 'Programming Language :: Python :: 3', + ], + cmdclass={'test': PyTest}, + long_description="".join(open('README.txt').readlines()), + tests_require=['pytest', 'scipy'], + #packages=['pyPdf','reportlab.pdfgen','reportlab.lib.colors','pygame.font' ], +# + ) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/test/Makefile new/pdfcompare-1.6.8/test/Makefile --- old/pdfcompare-1.6.5/test/Makefile 2013-10-24 14:09:14.000000000 +0200 +++ new/pdfcompare-1.6.8/test/Makefile 2016-04-19 17:23:08.000000000 +0200 @@ -1,10 +1,16 @@ -VER=1.3 refresh= all: test -test: - ln -sf ../pdf_highlight.py pdfcompare +test_requires: + @echo The selftest uses the following extra packages: + @rpm -q shunit2 || exit 2 + @rpm -q python-scipy || exit 2 + @rpm -q pdftk || exit 2 + @echo ----------------------------------------------- + +test: test_requires + ln -sf ../pdfcompare.py pdfcompare env PATH=.:$$PATH sh ./helptest.sh $(VER) VER=$(VER) env PATH=.:$$PATH sh ./python3.sh env PATH=.:$$PATH refresh=$(refresh) sh ./restest.sh diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/pdfcompare-1.6.5/test/cli/test_cli.py new/pdfcompare-1.6.8/test/cli/test_cli.py --- old/pdfcompare-1.6.5/test/cli/test_cli.py 1970-01-01 01:00:00.000000000 +0100 +++ new/pdfcompare-1.6.8/test/cli/test_cli.py 2016-04-19 17:23:08.000000000 +0200 @@ -0,0 +1,28 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- + +import os.path + + +def test_version(): + """ + Checks, if version number in last line of help output is available + """ + import subprocess + L=subprocess.check_output(['./pdf_highlight.py','-h']) + L=L.strip() + LL=L.split("\n") + assert 'version' in LL[-1] + + + +def test_pdfcompare_exists(): + assert os.path.exists('pdf_highlight.py') + +def test_scipy(): + """ + Checks, if the module scipy is available + """ + import scipy + assert scipy.__version__ + ++++++ pdfcompare.dsc ++++++ Format: 1.0 Source: pdfcompare Version: 1.6.8-1 Binary: pdfcompare Maintainer: Jürgen Weigert <[email protected]> Architecture: any Build-Depends: debhelper (>= 4.2.21) # https://github.com/openSUSE/obs-build/pull/147 DEBTRANSFORM-RELEASE: 1 ++++++ pull_github.sh ++++++ #! /bin/sh # [email protected]:jnweiger/pdfcompare.git name=pdfcompare rm -rf $name tstamp=$(date +%Y%m%d) git clone --depth 1 --branch master $url -o $name version=$(grep '^__VERSION__' $name/pdfcompare.py | sed -e "s@.*'\(.*\)'.*@\1@") #version=$version.git$tstamp mv $name $name-$version rm $name-*.tar.bz2 tar jcvf $name-$version.tar.bz2 --exclude '.??*' $name-$version rm -rf $name-$version sed -i -e "s@^\(Version:\s*\).*@\1"$version"@" *.spec sed -i -e "s@^\(Source0:\s*\).*@\1"$name-$version.tar.bz2"@" *.spec osc addremove echo "now run: vi *.dsc; debchange -mc debian.changelog; osc vc; osc up; osc ci"
