Thank you very much Art Rhyno. Sounds good I will try it, let's see if it
works better.

2015-05-27 18:21 GMT+02:00 Art Rhyno. <[email protected]>:

>  You could try leveraging the coordinates for the words (available in the
> hocr output) or the letters themselves (via the API) and doing different
> processing for the title based on the size of the letters. Difference of
> Gaussians or another type of filter could thin the letters out, and you
> could also try tesseract in single character mode if you can isolate each
> letter. The bane of ocr for old newspapers tends to be multi-columned
> printing, in which case a separate segmentation tool, like olena, can be
> invaluable, but your sample does not suggest that columns are a factor.
>
>
>
> art
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Claudi Ruiz
> *Sent:* Tuesday, May 26, 2015 4:25 AM
> *To:* [email protected]
> *Subject:* [tesseract-ocr] Re: Improve the tesseract output from an old
> newspapers
>
>
>
> How can I improve detection for the title specifically?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/fc27a199-6df6-4533-9693-641ed5c460be%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/fc27a199-6df6-4533-9693-641ed5c460be%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/t4RPerdfTIs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/BY2PR11MB05528A9FC542E116550700FCDCCB0%40BY2PR11MB0552.namprd11.prod.outlook.com
> <https://groups.google.com/d/msgid/tesseract-ocr/BY2PR11MB05528A9FC542E116550700FCDCCB0%40BY2PR11MB0552.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADKhs-CzoGwcEWBxNyFpE4JR9GNMVJff6RC9%2BMA7OwNjfm%2BmJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to