Bug#1023273: Old version is not working
The current maintainer of ocrmypdf and pikepdf is looking for a new maintainer, if someone else is able. On Sun, Nov 13, 2022 at 6:06 AM Anton Gladky wrote: > The newer 14 version of ocrmypdf is needed to suppor the > ghostscript 10. > > I have checked and can confirm, that 14.0.1 is working well. > > Regards > > Anton > >
Bug#976092: ocrmypd fails autopkg tests in testing
This is almost certainly a problem with how Debian is compiling or linking ghostscript with libjbig2dec. This error would be reproducible with: gs -sDEVICE=pngmono -o out.png any_pdf_that_contains_a_jbig2_image.pdf Debian's test suite for ghostscript is just a simple smoke test, so ocrmypdf frequently uncovers problems with ghostscript. James On Sun, Nov 29, 2020 at 7:42 AM Matthias Klose wrote: > Package: src:ocrmypdf > Version: 10.3.1+dfsg-1 > Severity: serious > Tags: sid bullseye > > ocrmypd fails autopkg tests in testing, but not in unstable. Looks like a > missing break on some dependency? > > see https://ci.debian.net/packages/o/ocrmypdf > > [...] > resources = > > PosixPath('/tmp/autopkgtest-lxc.fy1hic18/downtmp/build.Qgv/src/tests/resources') > outdir = > PosixPath('/tmp/pytest-of-debci/pytest-0/test_rotate_deskew_timeout0') > > def test_rotate_deskew_timeout(resources, outdir): > check_ocrmypdf( > resources / 'rotated_skew.pdf', > outdir / 'deskewed.pdf', > '--rotate-pages', > '--rotate-pages-threshold', > '0', > '--deskew', > '--tesseract-timeout', > '0', > '--pdf-renderer', > 'sandwich', > ) > > correlation = check_monochrome_correlation( > outdir, > reference_pdf=resources / 'ccitt.pdf', > reference_pageno=1, > test_pdf=outdir / 'deskewed.pdf', > test_pageno=1, > ) > > # Confirm that the page still got deskewed > > assert correlation > 0.50 > E assert 0.0 > 0.5 > > tests/test_rotation.py:214: AssertionError > - Captured stderr call > - > > Scanning contents: 0%| | 0/1 [00:00 Scanning contents: 100%|██| 1/1 [00:00<00:00, 256.25page/s] > > OCR: 0%| | 0.0/1.0 [00:00 OCR: 50%|█ | 0.5/1.0 [00:00<00:00, 1.62page/s] > OCR: 100%|██| 1.0/1.0 [00:00<00:00, 3.19page/s] > > JPEGs: 0image [00:00, ?image/s] > JPEGs: 0image [00:00, ?image/s] > > JBIG2: 0item [00:00, ?item/s] > JBIG2: 0item [00:00, ?item/s] > -- Captured log call > --- > INFO ocrmypdf.builtin_plugins.tesseract_ocr:tesseract_ocr.py:136 Using > Tesseract OpenMP thread limit 2 > ERRORocrmypdf._exec.ghostscript:ghostscript.py:134 jbig2dec FATAL ERROR > decoding image: incompatible jbig2dec header (0.18) and library (0.19) > versions > Error > reading a > content stream. The page may be incomplete. > > Output may > be incorrect. > Error: File > did > not complete the page properly and may be damaged. > > Output may > be incorrect. > INFO ocrmypdf._pipeline:_pipeline.py:401 with existing rotation ⇨, > page is > facing ⇧, confidence 0.00 - rotation appears correct > ERRORocrmypdf._exec.ghostscript:ghostscript.py:134 jbig2dec FATAL ERROR > decoding image: incompatible jbig2dec header (0.18) and library (0.19) > versions > Error > reading a > content stream. The page may be incomplete. > > Output may > be incorrect. > Error: File > did > not complete the page properly and may be damaged. > > Output may > be incorrect. > WARNING ocrmypdf._pipeline:_pipeline.py:738 Some input metadata could not > be > copied because it is not permitted in PDF/A. You may wish to examine the > output > PDF's XMP metadata. > INFO ocrmypdf.optimize:optimize.py:589 Optimize ratio: 1.00 savings: > 0.0% > INFO ocrmypdf._sync:_sync.py:381 Output file is a PDF/A-2B (as > expected) > 4 failed, 240 passed, 37 skipped, 1 xfailed in 359.28 seconds > = > autopkgtest [08:20:25]: test test-suite: ---] > autopkgtest [08:20:25]: test test-suite: - - - - - - - - - - results - - > - - - - - > >
Bug#939044: ocrmypdf: autopkgtest not compatible with new pikepdf, ghostscript and/or pytest
Sean Whitton and I confirmed the issue still occurs with Ghostscript 9.28rc2. I reported the issue with Ghostscript here: https://bugs.ghostscript.com/show_bug.cgi?id=701552 On Fri, Sep 6, 2019 at 1:58 AM Jonas Smedegaard wrote: > > Quoting James R Barlow (2019-09-06 10:15:59) > > On Thu, Sep 5, 2019 at 11:57 PM Jonas Smedegaard wrote: > > > > > > Quoting Sean Whitton (2019-09-06 06:20:47) > > > > On Sat 31 Aug 2019 at 03:58PM +02, Jonas Smedegaard wrote: > > > > > > > > > Possibly some of the other tools uses undocumented insecure > > > > > ghostscript calls which was recently removed. > > > > > > > > > > To investigate that further, someone needs to extract the actual > > > > > input (probably Postscript or PDF) and the exact command used to > > > > > call ghostscript. > > > > > > > > This was indeed a problem and ocrmypdf upstream has fixed it in > > > > the latest release. > > > > > > Ah, great that the cause has been located! > > > > > > ...and happy that my guess was correct :-) > > > > Not quite? ocrmypdf did not use any undocumented ghostscript calls. It > > followed an example from Ghostscript's documentation almost verbatim > > to generate a .ps from a template that tells Ghostscript to insert an > > ICC profile, referenced by filename. Ghostscript 9.28 is disabling > > access to all files from a .ps file unless safety is explicitly > > disabled. So nothing undocumented or exploitable was happening. (But > > it does make sense for Ghostscript to make the change.) > > > > It does mean any other software that uses Ghostscript to generate > > PDF/X, PDF/E, or PDF/A is likely going to break as well with this > > release. > > Thanks for the clarification - helps me not spread any further false > information! > > - Jonas > > -- > * Jonas Smedegaard - idealist & Internet-arkitekt > * Tlf.: +45 40843136 Website: http://dr.jones.dk/ > > [x] quote me freely [ ] ask before reusing [ ] keep private
Bug#939044: ocrmypdf: autopkgtest not compatible with new pikepdf, ghostscript and/or pytest
On Thu, Sep 5, 2019 at 11:57 PM Jonas Smedegaard wrote: > > Quoting Sean Whitton (2019-09-06 06:20:47) > > On Sat 31 Aug 2019 at 03:58PM +02, Jonas Smedegaard wrote: > > > > > Possibly some of the other tools uses undocumented insecure > > > ghostscript calls which was recently removed. > > > > > > To investigate that further, someone needs to extract the actual > > > input (probably Postscript or PDF) and the exact command used to > > > call ghostscript. > > > > This was indeed a problem and ocrmypdf upstream has fixed it in the > > latest release. > > Ah, great that the cause has been located! > > ...and happy that my guess was correct :-) Not quite? ocrmypdf did not use any undocumented ghostscript calls. It followed an example from Ghostscript's documentation almost verbatim to generate a .ps from a template that tells Ghostscript to insert an ICC profile, referenced by filename. Ghostscript 9.28 is disabling access to all files from a .ps file unless safety is explicitly disabled. So nothing undocumented or exploitable was happening. (But it does make sense for Ghostscript to make the change.) It does mean any other software that uses Ghostscript to generate PDF/X, PDF/E, or PDF/A is likely going to break as well with this release. > They've issued another pre-release yesterday - I hope to package that > soon, maybe today. > > > - Jonas > > -- > * Jonas Smedegaard - idealist & Internet-arkitekt > * Tlf.: +45 40843136 Website: http://dr.jones.dk/ > > [x] quote me freely [ ] ask before reusing [ ] keep private
Bug#934035: ocrmypdf: FTBFS in stretch (failing tests)
The issue here is that we have an old version of ocrmypdf (4.3.5) with a backported version of Ghostscript (9.26) and the latter's behavior has changed in a way that breaks the test. I recommend disabling the test and documenting a caveat that certain metadata may not be preserved in output files. This is arguably a fairly minor loss of functionality. On Tue, Aug 6, 2019 at 3:48 AM Santiago Vila wrote: > Package: src:ocrmypdf > Version: 4.3.5-3 > Severity: serious > Tags: ftbfs > > Dear maintainer: > > I tried to build this package in stretch but it failed: > > > > [...] > debian/rules build-indep > dh build-indep --with python3,sphinxdoc --buildsystem=pybuild >dh_testdir -i -O--buildsystem=pybuild >dh_update_autotools_config -i -O--buildsystem=pybuild >dh_autoreconf -i -O--buildsystem=pybuild >dh_auto_configure -i -O--buildsystem=pybuild > I: pybuild base:184: python3.5 setup.py config > Skipping external program tests because of --force > running config >debian/rules override_dh_auto_build > make[1]: Entering directory '/<>' > mkdir -p debian/.debhelper > cp -R ocrmypdf debian/.debhelper > sed -i debian/.debhelper/ocrmypdf/__init__.py -e \ > "s|^VERSION =.*|VERSION = \"4.3.5\"|" > PYTHONPATH=debian/.debhelper sphinx-build docs html > Running Sphinx v1.4.9 > making output directory... > loading pickled environment... not yet created > building [mo]: targets for 0 po files that are out of date > building [html]: targets for 7 source files that are out of date > updating environment: 7 added, 0 changed, 0 removed > reading sources... [ 14%] cookbook > reading sources... [ 28%] errors > reading sources... [ 42%] index > reading sources... [ 57%] installation > reading sources... [ 71%] introduction > reading sources... [ 85%] languages > reading sources... [100%] security > > /<>/docs/installation.rst:2: WARNING: Duplicate explicit > target name: "docker". > /<>/docs/installation.rst:2: WARNING: Duplicate explicit > target name: "docker". > looking for now-outdated files... none found > pickling environment... done > checking consistency... /<>/docs/installation.rst:: WARNING: > document isn't included in any toctree > done > preparing documents... done > writing output... [ 14%] cookbook > writing output... [ 28%] errors > writing output... [ 42%] index > writing output... [ 57%] installation > writing output... [ 71%] introduction > writing output... [ 85%] languages > writing output... [100%] security > > generating indices... genindex > writing additional pages... search > copying images... [100%] bitmap_vs_svg.svg > > copying static files... WARNING: html_static_path entry > '/<>/docs/_static' does not exist > done > copying extra files... done > dumping search index in English (code: en) ... done > dumping object inventory... done > build succeeded, 4 warnings. > dh_auto_build -O--buildsystem=pybuild > I: pybuild base:184: /usr/bin/python3 setup.py build > Skipping external program tests because of --force > running build > running build_py > creating /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/unpaper.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/hocrtransform.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/pdfa.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/ghostscript.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/leptonica.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/tesseract.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/main.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/__init__.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/qpdf.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/__main__.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > copying ocrmypdf/pageinfo.py -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf > creating /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf/data > copying ocrmypdf/data/sRGB.icc -> > /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf/data > generating cffi module > '/<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf/lib/_leptonica.py' > creating /<>/.pybuild/pythonX.Y_3.5/build/ocrmypdf/lib > make[1]: Leaving directory '/<>' >debian/rules override_dh_auto_test > make[1]: Entering directory '/<>' > python3 setup.py test > Checking for tesseract >= 3.03... > Found tesseract 3.04.01 > Checking for gs >= 9.15... > Found gs 9.26 > Checking for unpaper >= 6.1... > Found unpaper 6.1 > Checking for qpdf >= 5.0.0... > Found qpdf 6.0.0 > running pytest > running egg_info > creating ocrmypdf.egg-info > writing requirements to ocrmypdf.egg-info/requires.txt > writing ocrmypdf.egg-info/PKG-INFO > writing top-level names to ocrmypdf.egg-info/top_level.txt > writing entry points to ocrmypdf.egg-info/entry_points.txt > writing dependency_links to
Bug#903627: ocrmypdf: contains workaround for old version of python3-ruffus which should not be used with current python3-ruffus
I backported the fixes related to python3-ruffus 2.7, python 3.7 support, and a few other minor changes from 7.0.0. I released it just now as 6.2.2, so that should take care of it. Let me know if there are any further issues. On Thu, 12 Jul 2018 at 01:03 Sean Whitton wrote: > Package: ocrmypdf > Version: 6.2.0-1 > Severity: serious > Tags: ftbfs > X-debbugs-cc: j...@purplerock.ca > > OCRmyPDF contains a workaround for a bug in python3-ruffus <=2.6.3 that > upstream reports should not be used with python3-ruffus >=2.7 (see > changelog entry for 4.1.2-1 upload). > > python3-ruffus 2.7 was just uploaded to Debian, so ocrmypdf is now > buggy, and indeed unbuildable. > > The current upstream release of OCRmyPDF, 7.0.0, will not be reaching > Debian unstable for some time: a new dependency, pikepdf, will target > experimental. So ideally we would patch the workaround out of OCRmyPDF > 6.2.0. CCing upstream to request advice on how to do this. > > -- > Sean Whitton >
Bug#894068: ocrmypdf: New dependency on PyMuPDF for v6.0.0
Package: ocrmypdf Version: v6.0.0 Severity: serious Tags: newcomer Justification: fails to build from source (but built successfully in the past) Dear Sean, In v6.0.0, which addresses and hopefully fixes #888917, I have introduced a new dependency on PyMuPDF (Python bindings for MuPDF). Unfortunately PyMuPDF isn't available in Debian as yet (I have checked there is no python3-pymupdf). The build procedure should go like this: - download/unpack MuPDF to mupdf/ - download/unpar PyMuPDF to pymupdf/ - cp pymupdf/fitz/_mupdf_config.h mupdf/include/mupdf/fitz/config.h - export CFLAGS=-fPIC - make HAVE_X11=no HAVE_GLFW=no HAVE_GLUT=no - patch pymupdf/setup.py to point library_dirs and include_dirs to the output of mupdf/ build The reason for this circumlocution is that the vendor of MuPDF, Artifex, does not provide or support dynamic libraries or a stable ABI, and compiling the Python bindings requires a dynamic library. Perhaps as a way to warn people about their stance, they don't enable -fPIC by default and link their application statically. This means that unfortunately, one cannot link to libmupdf-dev (and actually, I'm not sure if libmupdf-dev serves any purpose at all, unless it were rebuilt with -fPIC). Certainly if the maintainers of this package could be persuaded to build it with -fPIC that would make this much easier. I did try to build with it with Debian sid against the libmupdf-dev library. The error, as with Ubuntu, is: relocation R_X86_64_PC32 against symbol `fz_empty_irect' can not be used when making a shared object; recompile with -fPIC The make options and replacement of the header file in mupdf are all disabling features unnecessary for PyMuPDF's purposes. It shrinks the binary from 30 MB to 3 MB. The PyMuPDF developers describe their build process here: https://github.com/rk700/PyMuPDF/wiki/Ubuntu-Installation-Experience I'm happy to help with the packaging of this dependency, and I got it the process working for Python binary wheels already. However, I don't really know much about Debian processes and policy. Regards, James -- System Information: Debian Release: buster/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 4.4.119-boot2docker (SMP w/1 CPU core) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968), LANGUAGE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages ocrmypdf depends on: pn ghostscript pn icc-profiles-free pn liblept5 ii python3 3.6.5~rc1-1 pn python3-cffi-backend-api-max pn python3-cffi-backend-api-min pn python3-img2pdf pn python3-pil ii python3-pkg-resources 39.0.1-1 pn python3-pypdf2 pn python3-reportlab pn python3-ruffus pn qpdf pn tesseract-ocr ii zlib1g1:1.2.8.dfsg-5 Versions of packages ocrmypdf recommends: pn unpaper Versions of packages ocrmypdf suggests: pn img2pdf pn ocrmypdf-doc pn python-watchdog
Bug#888917: ocrmypdf fails to run it's testsuite
Upstream here. The reason the suite fails like that is that mandatory-for-testing dependencies were also removed. The test suite runs on Travis CI in 10-12 minutes. On Debian CI, 15 minutes. For comparison ffmpeg, another compute intensive CLI program, takes 10 minutes. This is an OCR program and OCR takes a long time. There are opportunities to speed up testing on my end but no low hanging fruit without removing tests. I've done the obvious: use all cores, use caches and dummies where possible. Some OCR on the fly is essential because Tesseract is complex enough that output is not identical across platforms. Preserving the dynamically created tests/cache/ folder between test runs, if possible in Debian CI, would speed it up a lot. I could mark a subset of essential tests for packagers so that Debian CI can specify it only wants those. There's a number of tests that are very unlikely to pass upstream testing (macOS and Ubuntu) then somehow fail downstream in Debian.