Do you know whether the figures are characters or bitmap images (we find a
lot of this in scientific publications). If they are characters with
non-standard codes , then there's probably only a small number of
characters in each font. We mapped this for some thousands of maths symbols
and I'd guess it's a smaller problem for chess. Alternatively the pieces
may be small bitmapped images  Our AMI3 tool , uses PDFBox and stores all
images and removes duplicates, recording the coordinates. It's Open source
and you are welcome to try it.
Mail me if so.

P.


On Wed, Apr 15, 2020 at 9:19 PM Fran Rojas <froja...@gmail.com> wrote:

> Hello Tilman,
>
> I have just tested the pdf with adobe reader and it neither recognized the
> characters.
>
> Then, what would the stragegy be ?
> Is there any way that the library returns the images of unrecognized
> characters so that the application could make an effort to interpret them
> (via a specialized OCR or something like that) ?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

-- 
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".

Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply via email to