Thank you very much Art Rhyno. Sounds good I will try it, let's see if it works better.
2015-05-27 18:21 GMT+02:00 Art Rhyno. <[email protected]>: > You could try leveraging the coordinates for the words (available in the > hocr output) or the letters themselves (via the API) and doing different > processing for the title based on the size of the letters. Difference of > Gaussians or another type of filter could thin the letters out, and you > could also try tesseract in single character mode if you can isolate each > letter. The bane of ocr for old newspapers tends to be multi-columned > printing, in which case a separate segmentation tool, like olena, can be > invaluable, but your sample does not suggest that columns are a factor. > > > > art > > > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Claudi Ruiz > *Sent:* Tuesday, May 26, 2015 4:25 AM > *To:* [email protected] > *Subject:* [tesseract-ocr] Re: Improve the tesseract output from an old > newspapers > > > > How can I improve detection for the title specifically? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/fc27a199-6df6-4533-9693-641ed5c460be%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/fc27a199-6df6-4533-9693-641ed5c460be%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/t4RPerdfTIs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/BY2PR11MB05528A9FC542E116550700FCDCCB0%40BY2PR11MB0552.namprd11.prod.outlook.com > <https://groups.google.com/d/msgid/tesseract-ocr/BY2PR11MB05528A9FC542E116550700FCDCCB0%40BY2PR11MB0552.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADKhs-CzoGwcEWBxNyFpE4JR9GNMVJff6RC9%2BMA7OwNjfm%2BmJA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

