Am 15.04.2020 um 23:12 schrieb Peter Murray-Rust:
Do you know whether the figures are characters or bitmap images (we find a
lot of this in scientific publications). If they are characters with
non-standard codes , then there's probably only a small number of
characters in each font. We mapped this for some thousands of maths symbols
and I'd guess it's a smaller problem for chess. Alternatively the pieces
may be small bitmapped images Our AMI3 tool , uses PDFBox and stores all
images and removes duplicates, recording the coordinates. It's Open source
and you are welcome to try it.
Mail me if so.
Or share such a file (upload to a sharehoster).
If you can't, open it with PDFDebugger and look around until you find
the fonts resources, and tell what you see.
Tilman
P.
On Wed, Apr 15, 2020 at 9:19 PM Fran Rojas <froja...@gmail.com> wrote:
Hello Tilman,
I have just tested the pdf with adobe reader and it neither recognized the
characters.
Then, what would the stragegy be ?
Is there any way that the library returns the images of unrecognized
characters so that the application could make an effort to interpret them
(via a specialized OCR or something like that) ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org