Hello,
Recently, I am working on detecting the font size of text with Tesseract
under Android platform. I found this post http://pastebin.com/0dV84hBa and
modified the GetHOCRText(int page_number) function in "baseapi.cpp" as
follows:
const char *font_name;
bool bold, italic, underlined, monospace, serif, smallcaps;
int pointsize, font_id;
const char* word = res_it->GetUTF8Text(RIL_WORD);
if (word != 0) {
font_name = res_it->WordFontAttributes(&bold, &italic, &underlined,
&monospace, &serif, &smallcaps,&pointsize, &font_id);
hocr_str += " !!!word: ";
hocr_str += word;
hocr_str += " !!!font_name: ";
hocr_str += font_name;
hocr_str += " !!!bold: ";
hocr_str += bold;
hocr_str += " !!!pointsize: ";
hocr_str += pointsize;
}
delete[] word;
However, only the text and font type (char) can be displayed correctly,
while "pointsize" (int) and "bold" (bool) are both unreadable messy code
(like a square).
Anyone has encountered this before?
Thanks
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.