Answering my own question:

i an successfully achieving the boxing part of the training with the following:

[... Do tesseract init, bootstrap on another training file, etc ...]
_tessApi->SetVariable                                   ("chop_enable",         
                "n");
_tessApi->SetVariable                                   
("wordrec_enable_assoc",        "n");
_tessApi->SetVariable                                   
("tessedit_create_boxfile",     "y");
_tessApi->SetImage                                      (imgOcr.bits(), 
imgOcr.width(), imgOcr.height(), 1, imgOcr.bytesPerLine());
char                                                    *text   = 
_tessApi->GetBoxText(0);
[... Do the boxing processing, this is handy because we don't need to write 
that on a file for now...]
delete[]                                                        *text;

The result is a box file contents which i can directly process in my app 
without having to do file I/O operations.
Now i'm wondering... The next step will be to do the training after the charset 
generation... But would tesseract be able to be trained without a box file and 
instead with some kind of binary format it would have generated by reading the 
box file? i want to avoid using files laid on the user's hard drive, so passing 
the boxes directly to the API would be very nice. i'm going to dig a bit into 
the sources.

i'll keep you posted wether i find what i'm looking for or not.

Thanks,
Pierre.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to