Bug#962872: ocrmypdf: new major upstream version available

2020-07-20 Thread Sean Whitton
Hello James,

On Sun 19 Jul 2020 at 08:00AM -07, James R Barlow wrote:

> I updated debian/copyright in both projects at the HEAD revision (not a
> tagged release). These files should reflect the current status.

Great.  I see you merged in my d/copyright.  Previously I'd not wanted
to bother you with that, but going forward, if I update d/copyright,
would you like PRs from me, or would you prefer to just merge in my
changes before making your own updates?

> I believe this means the updates shouldn't be too difficult, and also
> that the -dfsg version tag could be dropped from both
> packages. (pikepdf is now powerful enough that I can usually
> synthesize problematic constructs instead of adding another test
> resource.)

Thank you for the details here -- I will look into verifying whether it
can be dropped.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#962872: ocrmypdf: new major upstream version available

2020-07-19 Thread James R Barlow
Sean and Rogério,

It's easiest for everyone if the difference between upstream and packages
is as small as possible, so I've been working on removing files that are
problematic for Debian.

In recent releases I have been removing all files that were previously in
Files-Excluded, except for:
pikepdf:docs/images/save-pike.jpg - public domain image of a sign likely
produced by a government agency in Ireland
ocrmypdf:docs/logo/logo - as we previously discussed, the .svg is now the
master version of the logo, and can be edited by open source tools.

In ocrmypdf, there are no new test resources since 9.8. I believe that the
patch that drops a test in tests/test_metadata.py can also be removed -
this previously used a resource with problematic copyright status, which is
probably why it was added.

In pikepdf, there are a few synthetic files I generated, and
pikepdf:tests/resources/jbig2global.pdf is a PDF'd copy of ocrmypdf:
tests/resources/typewriter.png. disable test_icc_extract.patch

can also be dropped, since the resource this used has been replaced with an
image I generated.

I updated debian/copyright in both projects at the HEAD revision (not a
tagged release). These files should reflect the current status.

I believe this means the updates shouldn't be too difficult, and also that
the -dfsg version tag could be dropped from both packages. (pikepdf is now
powerful enough that I can usually synthesize problematic constructs
instead of adding another test resource.)

James

On Sat, Jul 18, 2020 at 12:06 PM Sean Whitton 
wrote:
>
> Hello Rogério,
>
> On Mon 15 Jun 2020 at 09:13AM -03, Rogério Brito wrote:
>
> > A new major upstream version (10.0.1) of ocrmypdf was released a few
days
> > ago and it is *so much faster* than the previous versions 8.x, 9.x,
> > especially during the (painful) initial step of "Scanning".
> >
> > I installed it via pip in a virtual environment and it works very well
and
> > many hours of users will be saved if this new version is made available
for
> > users of Debian in general.
>
> Thank you for letting me know about the speed improvements.
>
> The main thing blocking updating both pikepdf and ocrmypdf -- which I
> try to do together since upstream is the same -- is updating d/copyright
> for all the new test resources which are included.
>
> This often requires looking up licenses on commons.wikimedia.org, and
> adding new files to Files-Excluded:.
>
> Perhaps you would be interested in helping out?
>
> What you would need to do is something like `git diff --name-status
> --diff-filter=ADR v1.13.0..v1.17.2` (versions are for pikepdf) and then
> work on a patch to d/copyright.
>
> All the other parts of the packaging, including actually applying
> Files-Excluded:, I can deal with easily myself.
>
> --
> Sean Whitton


Bug#962872: ocrmypdf: new major upstream version available

2020-07-18 Thread Sean Whitton
Hello Rogério,

On Mon 15 Jun 2020 at 09:13AM -03, Rogério Brito wrote:

> A new major upstream version (10.0.1) of ocrmypdf was released a few days
> ago and it is *so much faster* than the previous versions 8.x, 9.x,
> especially during the (painful) initial step of "Scanning".
>
> I installed it via pip in a virtual environment and it works very well and
> many hours of users will be saved if this new version is made available for
> users of Debian in general.

Thank you for letting me know about the speed improvements.

The main thing blocking updating both pikepdf and ocrmypdf -- which I
try to do together since upstream is the same -- is updating d/copyright
for all the new test resources which are included.

This often requires looking up licenses on commons.wikimedia.org, and
adding new files to Files-Excluded:.

Perhaps you would be interested in helping out?

What you would need to do is something like `git diff --name-status
--diff-filter=ADR v1.13.0..v1.17.2` (versions are for pikepdf) and then
work on a patch to d/copyright.

All the other parts of the packaging, including actually applying
Files-Excluded:, I can deal with easily myself.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#962872: ocrmypdf: new major upstream version available

2020-06-15 Thread Rogério Brito
Package: ocrmypdf
Version: 9.8.0+dfsg-1
Severity: wishlist

Hi, Sean.

A new major upstream version (10.0.1) of ocrmypdf was released a few days
ago and it is *so much faster* than the previous versions 8.x, 9.x,
especially during the (painful) initial step of "Scanning".

I installed it via pip in a virtual environment and it works very well and
many hours of users will be saved if this new version is made available for
users of Debian in general.


Thanks so much for caring about ocrmypdf,

Rogério Brito.

-- System Information:
Debian Release: bullseye/sid
  APT prefers testing
  APT policy: (500, 'testing'), (200, 'unstable'), (150, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.6.0-2-rt-amd64 (SMP w/4 CPU cores; PREEMPT)
Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.utf-8, LC_CTYPE=pt_BR.utf-8 (charmap=UTF-8), 
LANGUAGE=en_US.utf-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages ocrmypdf depends on:
ii  ghostscript  9.52~dfsg-1
ii  icc-profiles-free2.0.1+dfsg-1
ii  liblept5 1.79.0-1
ii  python3  3.8.2-3
ii  python3-cffi-backend [python3-cffi-backend-api-min]  1.14.0-2
pn  python3-cffi-backend-api-max 
ii  python3-chardet  3.0.4-7
ii  python3-img2pdf  0.3.6-1
ii  python3-pdfminer 20191020+dfsg-2
ii  python3-pikepdf  1.13.0+dfsg-1
ii  python3-pil  7.0.0-4+b1
ii  python3-pkg-resources46.1.3-1
ii  python3-reportlab3.5.34-1
ii  python3-tqdm 4.43.0-1
ii  tesseract-ocr4.1.1-2+b1
ii  zlib1g   1:1.2.11.dfsg-2

Versions of packages ocrmypdf recommends:
ii  pngquant  2.12.2-1
ii  unpaper   6.1-2+b2

Versions of packages ocrmypdf suggests:
ii  img2pdf  0.3.6-1
pn  ocrmypdf-doc 
pn  python-watchdog  

-- no debconf information

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br