i nearly have the same problem, but i want to remove any non text area from a scanned image in order to enhance tesseract accuracy, i have seen the three methods: segmentPage-- in tesserctClass.h , remove_non_text_area in osdetect.h and getRegions in baseAPI.h, i guess that all of these methods remove non text area from the page , if you have used any of these method, please provide me with any code samples in order to know how to use them,thanks.
>________________________________ > From: Cory Nelson <[email protected]> >To: [email protected] >Sent: Monday, April 23, 2012 5:17 PM >Subject: Manual page segmentation > > >Hi all, > >New to tesseract But I've been able to get pretty far with it in the past >week. I'm using AnalyseLayout with PSM_AUTO to let tesseract guess the page's >segments. I then present these in a UI where the user can modify them. > >It's fairly easy to call SetRectangle and OCR each of these segments >individually, but this appears to greatly hurt OCR quality (I guess because it >has less to learn from). Is there a way to manually reset segments to scan, or >to at least prevent this quality loss? > -- >You received this message because you are subscribed to the Google >Groups "tesseract-ocr" group. >To post to this group, send email to [email protected] >To unsubscribe from this group, send email to >[email protected] >For more options, visit this group at >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

