Bug#813562: Test suite failure

2016-02-21 Thread Sean Whitton
On Sun, Feb 21, 2016 at 12:46:52PM +, James R Barlow wrote:
> Great news. 4.0.2 is ready now. 

Sweet.  Package build in progress.

> I did find while updating my Docker image that Debian stretch's
> version of Ghostscript (gs 9.16~dfsg-2.1) produces error messages and
> blank pages on JPEG 2000 images. It's fixed in Sid, but the fix hasn't
> moved downstream yet.

Thanks for letting me know -- I've added a dependency bound on the
version in Sid, so that ocrmypdf won't migrate to stretch until
ghostscript does.

> Thanks again for doing this.

No, thank you for your help with the process.  I'm very grateful to be
able to use ocrmypdf as part of my effort to work paperlessly.  It works
really well combined with Recoll desktop search.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#813562: Test suite failure

2016-02-21 Thread James R Barlow
Great news. 4.0.2 is ready now.

I did find while updating my Docker image that Debian stretch's version of
Ghostscript (gs 9.16~dfsg-2.1) produces error messages and blank pages on
JPEG 2000 images. It's fixed in Sid, but the fix hasn't moved downstream
yet.

Thanks again for doing this.

On Sat, 20 Feb 2016 at 17:05 Sean Whitton  wrote:

> Hello,
>
> On Sat, Feb 20, 2016 at 03:27:00AM +, James R Barlow wrote:
> > Thanks for your help. Output order is due to multiprocessing.
>
> No problem.
>
> > That nailed it. tesseract 3.04.01 changed its output when asked to
> > determine page orientation. It's an improved, but it breaks parsing.
> >
> > I will throw together a patch to make the appropriate distinctions.
>
> I thought you might appreciate knowing that version 4.0.2rc1 builds fine
> in a clean Debian Sid chroot, and the test suite passes as part of the
> package build.
>
> I'm looking forward to 4.0.2!  (Release candidates are not generally
> uploaded to Debian.)
>
> --
> Sean Whitton
>


Bug#813562: Test suite failure

2016-02-20 Thread Sean Whitton
Hello,

On Sat, Feb 20, 2016 at 03:27:00AM +, James R Barlow wrote:
> Thanks for your help. Output order is due to multiprocessing.

No problem.

> That nailed it. tesseract 3.04.01 changed its output when asked to
> determine page orientation. It's an improved, but it breaks parsing.
> 
> I will throw together a patch to make the appropriate distinctions.

I thought you might appreciate knowing that version 4.0.2rc1 builds fine
in a clean Debian Sid chroot, and the test suite passes as part of the
package build.

I'm looking forward to 4.0.2!  (Release candidates are not generally
uploaded to Debian.)

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#813562: Test suite failure

2016-02-19 Thread James R Barlow
Thanks for your help. Output order is due to multiprocessing.

That nailed it. tesseract 3.04.01 changed its output when asked to
determine page orientation. It's an improved, but it breaks parsing.

I will throw together a patch to make the appropriate distinctions.


$ tess-3.04.01 -psm 0 tests/resources/linn-west.jpg stdout
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 29.34
Script: Latin
Script confidence: 45.33

$ tess-3.04.00 -psm 0 tests/resources/linn-west.jpg stdout
Orientation: 3
Orientation in degrees: 90
Orientation confidence: 29.34
Script: 1
Script confidence: 45.33



On Fri, Feb 19, 2016 at 16:28 Sean Whitton  wrote:

> Hello,
>
> On Fri, Feb 19, 2016 at 10:45:51PM +, James R Barlow wrote:
> > In any case, could you try running this:
> > ocrmypdf --rotate-pages tests/resources/cardinal.pdf out.pdf
> >
> > In cardinal.pdf the same page is rotated in each cardinal direction.
> out.pdf
> > should have all pages facing up. Is this the case? The output will also
> give
> > information on rotation status:
> > INFO - 1: page is facing ⇧, confidence 18.69
> > INFO - 3: page is facing ⇩, confidence 21.86 - correcting rotation
> > INFO - 4: page is facing ⇦, confidence 20.71 - correcting rotation
> > INFO - 2: page is facing ⇨, confidence 21.63 - correcting rotation
> > INFO - 3: rotating image layer 180 degrees
> > INFO - 2: rotating image layer 90 degrees
> > INFO - 4: rotating image layer 270 degrees
>
> No, it gets it wrong.  Result attached, and the output:
>
> ,
> | root@artemis:/build/ocrmypdf-4.0.1# ocrmypdf --rotate-pages
> tests/resources/cardinal.pdf out.pdf
> | INFO -1: page is facing ⇧, confidence 18.69
> | INFO -2: page is facing ⇦, confidence 21.63 - correcting rotation
> | INFO -3: page is facing ⇩, confidence 21.86 - correcting rotation
> | INFO -4: page is facing ⇨, confidence 20.71 - correcting rotation
> | INFO -2: rotating image layer 270 degrees
> | INFO -3: rotating image layer 180 degrees
> | INFO -4: rotating image layer 90 degrees
> `
>
> (note that the order it processes the pages in is different to your
> example)
>
> > It would also help to try in python3:
> >
> > >>> import ocrmypdf.leptonica as lp
> > >>> lp.getLeptonicaVersion()
> >
> > ...to see if there's anything unusual about how debian sid is reporting
> the
> > leptonica version.
>
> ,
> | root@artemis:/build/ocrmypdf-4.0.1# cd /usr/lib/python3/dist-packages
> | root@artemis:/usr/lib/python3/dist-packages# python3
> | Python 3.5.1+ (default, Jan 13 2016, 15:09:18)
> | [GCC 5.3.1 20160101] on linux
> | Type "help", "copyright", "credits" or "license" for more information.
> | >>> import ocrmypdf.leptonica as lp
> | >>> lp.getLeptonicaVersion()
> | 'leptonica-1.73'
> `
>
> --
> Sean Whitton
>


Bug#813562: Test suite failure

2016-02-19 Thread James R Barlow
I ran into a similar failure because leptonica 1.71 has an integer overflow
bug in the function pixCorrelationBinary which I use only in the test suite
to check if some output PDFs visually resemble an expected reference PDF. I
rewrote that function in Python for the older versions. The relevant code
is ocrmypdf.leptonica.Pix.correlation_binary. I added a test that only
exercises pixCorrelationBinary (test_monochrome_correlation), and this one
passed.

I checked that the tests can pass in the Docker version (they are slightly
broken for an unrelated reason), which is debian stretch which has
leptonica 1.73 (good version) and the same set of libraries as yours. The
one difference is tesseract 3.04.01 vs .00, but I compiled the tesseract
3.04.01 and found that made no difference.

In any case, could you try running this:
ocrmypdf --rotate-pages tests/resources/cardinal.pdf out.pdf

In cardinal.pdf the same page is rotated in each cardinal direction.
out.pdf should have all pages facing up. Is this the case? The output will
also give information on rotation status:
INFO - 1: page is facing ⇧, confidence 18.69
INFO - 3: page is facing ⇩, confidence 21.86 - correcting rotation
INFO - 4: page is facing ⇦, confidence 20.71 - correcting rotation
INFO - 2: page is facing ⇨, confidence 21.63 - correcting rotation
INFO - 3: rotating image layer 180 degrees
INFO - 2: rotating image layer 90 degrees
INFO - 4: rotating image layer 270 degrees

That would help establish whether something is actually wrong or the test
case is somehow at fault.

It would also help to try in python3:

>>> import ocrmypdf.leptonica as lp
>>> lp.getLeptonicaVersion()

...to see if there's anything unusual about how debian sid is reporting the
leptonica version.


On Fri, 19 Feb 2016 at 12:04 Sean Whitton  wrote:

> Hello,
>
> On Fri, Feb 19, 2016 at 07:11:32AM +, James R Barlow wrote:
> > What version of leptonica is installed?
> > tesseract --version will report this.
>
> From within my Sid chroot:
>
> root@artemis:/build/ocrmypdf-4.0.1# tesseract --version
> tesseract 3.04.01
>  leptonica-1.73
>   libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 1.4.2) : libpng 1.2.54 :
> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
>
> > Also what's the file name for liblept?
>
> The Debian liblept package provides:
>
> /usr/lib/liblept.so.5
> /usr/lib/liblept.so.5.0.0
>
> --
> Sean Whitton
>


Bug#813562: Test suite failure

2016-02-19 Thread Sean Whitton
Hello,

On Fri, Feb 19, 2016 at 07:11:32AM +, James R Barlow wrote:
> What version of leptonica is installed?
> tesseract --version will report this.

From within my Sid chroot:

root@artemis:/build/ocrmypdf-4.0.1# tesseract --version
tesseract 3.04.01
 leptonica-1.73
  libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 
4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0

> Also what's the file name for liblept?

The Debian liblept package provides:

/usr/lib/liblept.so.5
/usr/lib/liblept.so.5.0.0

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#813562: Test suite failure

2016-02-18 Thread James R Barlow
I have seen a similar problem.

What version of leptonica is installed?
tesseract --version will report this.
Also what's the file name for liblept?
On Thu, Feb 18, 2016 at 21:29 Sean Whitton  wrote:

> Dear James,
>
> OCRmyPDF's test suite is currently failing under a freshly-installed
> Debian Sid chroot.  I've attached the output to this e-mail.
>
> Since the test suite worked on yesterday's version of Debian Sid, I
> think that this must be due to a bug introduced in a new version of one
> the dependencies.  That means it's my job to figure out what the problem
> is, and it is unlikely to be a bug in OCRmyPDF for you to fix.  I'm
> e-mailing you just in case the problem is obvious to you from reading
> the output.
>
> Thanks.
>
> --
> Sean Whitton
>


Bug#813562: Test suite failure

2016-02-18 Thread Sean Whitton
Dear James,

OCRmyPDF's test suite is currently failing under a freshly-installed
Debian Sid chroot.  I've attached the output to this e-mail.

Since the test suite worked on yesterday's version of Debian Sid, I
think that this must be due to a bug introduced in a new version of one
the dependencies.  That means it's my job to figure out what the problem
is, and it is unlikely to be a bug in OCRmyPDF for you to fix.  I'm
e-mailing you just in case the problem is obvious to you from reading
the output.

Thanks.

-- 
Sean Whitton
= test session starts ==
platform linux -- Python 3.4.4, pytest-2.8.7, py-1.4.31, pluggy-0.3.1
rootdir: /build/ocrmypdf-4.0.1, inifile: pytest.ini
collected 44 items

tests/test_hocrtransform.py .
tests/test_main.py ...FF..
tests/test_pageinfo.py 

=== FAILURES ===
 test_autorotate[hocr] _

spoof_tesseract_cache = {'BIBINPUTS': 
'/home/swhitton/doc:/home/swhitton/doc/papers:', 'BROWSER': 'iceweasel', 
'BUILDRESULTGID': '1000', 'BUILDRESULTUID': '1000', ...}
renderer = 'hocr'

@pytest.mark.parametrize('renderer', [
'hocr',
'tesseract',
])
def test_autorotate(spoof_tesseract_cache, renderer):
import ocrmypdf.ghostscript as ghostscript
import logging

gslog = logging.getLogger()

# cardinal.pdf contains four copies of an image rotated in each cardinal
# direction - these ones are "burned in" not tagged with /Rotate
out = check_ocrmypdf('cardinal.pdf', 'test_autorotate_%s.pdf' % 
renderer,
 '-r', '-v', '1', env=spoof_tesseract_cache)
for n in range(1, 4+1):
correlation = check_monochrome_correlation(
reference_pdf=_infile('cardinal.pdf'),
reference_pageno=1,
test_pdf=out,
test_pageno=n)
>   assert correlation > 0.80
E   assert 0.01808749884366989 > 0.8

tests/test_main.py:310: AssertionError
- Captured stdout call -
/build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png
/build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png
__ test_autorotate[tesseract] __

spoof_tesseract_cache = {'BIBINPUTS': 
'/home/swhitton/doc:/home/swhitton/doc/papers:', 'BROWSER': 'iceweasel', 
'BUILDRESULTGID': '1000', 'BUILDRESULTUID': '1000', ...}
renderer = 'tesseract'

@pytest.mark.parametrize('renderer', [
'hocr',
'tesseract',
])
def test_autorotate(spoof_tesseract_cache, renderer):
import ocrmypdf.ghostscript as ghostscript
import logging

gslog = logging.getLogger()

# cardinal.pdf contains four copies of an image rotated in each cardinal
# direction - these ones are "burned in" not tagged with /Rotate
out = check_ocrmypdf('cardinal.pdf', 'test_autorotate_%s.pdf' % 
renderer,
 '-r', '-v', '1', env=spoof_tesseract_cache)
for n in range(1, 4+1):
correlation = check_monochrome_correlation(
reference_pdf=_infile('cardinal.pdf'),
reference_pageno=1,
test_pdf=out,
test_pageno=n)
>   assert correlation > 0.80
E   assert 0.01808749884366989 > 0.8

tests/test_main.py:310: AssertionError
- Captured stdout call -
/build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png
/build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png
 2 failed, 42 passed in 667.14 seconds =


signature.asc
Description: PGP signature