Answering my own question:
i an successfully achieving the boxing part of the training with the following:
[... Do tesseract init, bootstrap on another training file, etc ...]
_tessApi->SetVariable ("chop_enable",
"n");
_tessApi->SetVariable
("wordrec_enable_assoc", "n");
_tessApi->SetVariable
("tessedit_create_boxfile", "y");
_tessApi->SetImage (imgOcr.bits(),
imgOcr.width(), imgOcr.height(), 1, imgOcr.bytesPerLine());
char *text =
_tessApi->GetBoxText(0);
[... Do the boxing processing, this is handy because we don't need to write
that on a file for now...]
delete[] *text;
The result is a box file contents which i can directly process in my app
without having to do file I/O operations.
Now i'm wondering... The next step will be to do the training after the charset
generation... But would tesseract be able to be trained without a box file and
instead with some kind of binary format it would have generated by reading the
box file? i want to avoid using files laid on the user's hard drive, so passing
the boxes directly to the API would be very nice. i'm going to dig a bit into
the sources.
i'll keep you posted wether i find what i'm looking for or not.
Thanks,
Pierre.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en