Re: Tesseract 3 and paragraph separation

TP Fri, 23 Mar 2012 02:05:49 -0700

On Fri, Mar 23, 2012 at 1:19 AM, TP <[email protected]> wrote:
> If you aren't afraid of doing some programming, look at the code for
> TessBaseAPI::GetHOCRText. It uses
> res_it->IsAtBeginningOf(RIL_PARA) to figure out where each paragraph
> begins.


I took a look at TessBaseAPI::GetUTF8Text() [1], and that's an even
better place to start.You just add a linefeed after each paragraph's
text.

[1] 
http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.cpp#901

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract 3 and paragraph separation

Reply via email to