Do you know whether the figures are characters or bitmap images (we find a lot of this in scientific publications). If they are characters with non-standard codes , then there's probably only a small number of characters in each font. We mapped this for some thousands of maths symbols and I'd guess it's a smaller problem for chess. Alternatively the pieces may be small bitmapped images Our AMI3 tool , uses PDFBox and stores all images and removes duplicates, recording the coordinates. It's Open source and you are welcome to try it. Mail me if so.
P. On Wed, Apr 15, 2020 at 9:19 PM Fran Rojas <froja...@gmail.com> wrote: > Hello Tilman, > > I have just tested the pdf with adobe reader and it neither recognized the > characters. > > Then, what would the stragegy be ? > Is there any way that the library returns the images of unrecognized > characters so that the application could make an effort to interpret them > (via a specialized OCR or something like that) ? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- "I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069