Yeah I remember that one, I even tried to find the problem and then did something else. Or maybe the IDE crashed so the window was no longer open and I forgot.

I did not even go far enough to find out whether the old text extraction was the "good" one or the new one.

Coincicentally, there is an issue
https://issues.apache.org/jira/browse/PDFBOX-4909
that may make it easier to get back to the old height calculation.

Tilman (works for free here)

Am 09.07.2020 um 04:10 schrieb Manuel Aristarán:
Hi!

I'm one of the maintainers of Tabula [0].

Due to some changes in PDFBox, we've been running on 2.0.15 for some time
now, and we would love to keep Tabula updated with the newest version of
our favorite library :)

Last year, Tilman Hausherr graciously submitted a PR [1] that updated
PDFBox to 2.0.19, but unfortunately broke a few tests, as it seems that
there were changes in the font measurement heuristics. Text measurement is
a critical need of Tabula, so we had to choose to stick with the latest
compatible version.

We want to offer a $200 USD bounty to fix the issue. We run entirely on
donations, and have funds available for this [2]. The goal is to update
Tabula to use PDFBox 2.0.20, and the requirement is that the test suite
passes in its entirety.

If you're interested, please get in touch with me at man...@jazzido.com

Thanks!


[0] https://tabula.technology
[1] https://github.com/tabulapdf/tabula-java/pull/325
[2] https://opencollective.com/tabulapdf

--
Manuel Aristarán
http://jazzido.com



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to