I think the hocr output has an option to output bounding info per character
also.

On Fri, 31 May 2019, 19:07 G. S., <[email protected]> wrote:

> Dear all,
>
> i have a pdf image file, (in Greek language)
>
> i would appreciate if you could help me on how i could
>
> a) have an output similar to what pdf alto does,
>
> but more important, have the position width and height info in a per
> character base.
>
> Up to now, pdfalto considers each word to be a token, so the output is on
> a per word base.
>
> https://github.com/kermitt2/pdfalto/issues/34
>
>
> Please tell me how would you approach this with
>
> https://github.com/tesseract-ocr
>
> which command and which parameters you would use?
>
> thank you very much in advance
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/32091990-88b9-426d-94f0-2c5278a9b9da%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/32091990-88b9-426d-94f0-2c5278a9b9da%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWNW_cOY08Q7H2W7UkRXJNb24KT3TsiQ6FkUPAJEod%2BaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to