[tesseract-ocr] have the width, height, of each character of an image pdf file

G. S. Fri, 31 May 2019 06:37:27 -0700

Dear all,

i have a pdf image file, (in Greek language)


i would appreciate if you could help me on how i could 

a) have an output similar to what pdf alto does, 

but more important, have the position width and height info in a per 
character base.

Up to now, pdfalto considers each word to be a token, so the output is on a 
per word base.

https://github.com/kermitt2/pdfalto/issues/34


Please tell me how would you approach this with 

https://github.com/tesseract-ocr

which command and which parameters you would use?

thank you very much in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/32091990-88b9-426d-94f0-2c5278a9b9da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] have the width, height, of each character of an image pdf file

Reply via email to