Dear all, i have a pdf image file, (in Greek language)
i would appreciate if you could help me on how i could a) have an output similar to what pdf alto does, but more important, have the position width and height info in a per character base. Up to now, pdfalto considers each word to be a token, so the output is on a per word base. https://github.com/kermitt2/pdfalto/issues/34 Please tell me how would you approach this with https://github.com/tesseract-ocr which command and which parameters you would use? thank you very much in advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/32091990-88b9-426d-94f0-2c5278a9b9da%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

