On Fri, Mar 23, 2012 at 1:19 AM, TP <[email protected]> wrote: > If you aren't afraid of doing some programming, look at the code for > TessBaseAPI::GetHOCRText. It uses > res_it->IsAtBeginningOf(RIL_PARA) to figure out where each paragraph > begins.
I took a look at TessBaseAPI::GetUTF8Text() [1], and that's an even better place to start.You just add a linefeed after each paragraph's text. [1] http://code.google.com/p/tesseract-ocr/source/browse/trunk/api/baseapi.cpp#901 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

