Ok i'm digging into the sources for the next step of using the API for training
tesseract...
So far, i am able to make the boxes and get the result as a char* without
writing to a box file. Now i'm trying to run the training, but i can't seem to
find a way to do that without using external files (The box file and the output
file)... Is there a way to do that?
_tessApi = new
tesseract::TessBaseAPI();
_tessApi->Init ("./", "eng");
// Initialize variables.
_tessApi->SetInputName ("Test.box");
_tessApi->SetOutputName ("Test");
_tessApi->SetVariable ("file_type", ".bl");
_tessApi->SetVariable
("tessedit_single_match", "0");
_tessApi->SetVariable
("tessedit_zero_rejection", "T");
_tessApi->SetVariable
("tessedit_minimal_rejection", "F");
_tessApi->SetVariable
("tessedit_write_rep_codes", "F");
_tessApi->SetVariable
("tessedit_resegment_from_boxes", "T");
_tessApi->SetVariable
("tessedit_train_from_boxes", "T");
_tessApi->SetVariable
("textord_fast_pitch_test", "T");
_tessApi->SetVariable ("textord_no_rejects",
"T");
_tessApi->SetVariable ("edges_children_fix",
"F");
_tessApi->SetVariable ("edges_childarea",
"0.65");
_tessApi->SetVariable ("edges_boxarea",
"0.9");
_tessApi->SetVariable ("il1_adaption_test",
"1");
_tessApi->SetPageSegMode (tesseract::PSM_AUTO_OSD);
// Prepare picture.
_tessApi->SetImage (imgOcr.bits(),
imgOcr.width(), imgOcr.height(), 1, imgOcr.bytesPerLine());
_tessApi->Recognize (0);
_tessApi->End ();
delete
_tessApi;
i see that internally when doing that, Tesseract is going through the
ApplyBoxTraining routine... However this routine takes a filename as one of
it's argument, and opens it itself. It would be very tempting to re-use the
code from ApplyBoxes itself, but i feel like if something changes in the source
code in future versions, i'll have to start over again...
Also, how can i get the result of the training as binary data instead of
specifying an output file (UNIX philosophy, anyone)?
i'm not sure if i will be able to do what i want to... Avoiding files.
Especially when it will come to mftraining and friends...
Thanks,
Pierre.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en