Ok i'm digging into the sources for the next step of using the API for training 
tesseract...

So far, i am able to make the boxes and get the result as a char* without 
writing to a box file. Now i'm trying to run the training, but i can't seem to 
find a way to do that without using external files (The box file and the output 
file)... Is there a way to do that?

_tessApi                                                                = new 
tesseract::TessBaseAPI();
_tessApi->Init                                                  ("./", "eng");
// Initialize variables.
_tessApi->SetInputName                          ("Test.box");
_tessApi->SetOutputName                         ("Test");
_tessApi->SetVariable                                   ("file_type", ".bl");
_tessApi->SetVariable                                   
("tessedit_single_match", "0");
_tessApi->SetVariable                                   
("tessedit_zero_rejection", "T");
_tessApi->SetVariable                                   
("tessedit_minimal_rejection", "F");
_tessApi->SetVariable                                   
("tessedit_write_rep_codes", "F");
_tessApi->SetVariable                                   
("tessedit_resegment_from_boxes", "T");
_tessApi->SetVariable                                   
("tessedit_train_from_boxes", "T");
_tessApi->SetVariable                                   
("textord_fast_pitch_test", "T");
_tessApi->SetVariable                                   ("textord_no_rejects", 
"T");
_tessApi->SetVariable                                   ("edges_children_fix", 
"F");
_tessApi->SetVariable                                   ("edges_childarea", 
"0.65");
_tessApi->SetVariable                                   ("edges_boxarea", 
"0.9");
_tessApi->SetVariable                                   ("il1_adaption_test", 
"1");
_tessApi->SetPageSegMode                        (tesseract::PSM_AUTO_OSD);
// Prepare picture.
_tessApi->SetImage                                      (imgOcr.bits(), 
imgOcr.width(), imgOcr.height(), 1, imgOcr.bytesPerLine());
_tessApi->Recognize                                     (0);
_tessApi->End                                                   ();
delete                                                                  
_tessApi;

i see that internally when doing that, Tesseract is going through the 
ApplyBoxTraining routine... However this routine takes a filename as one of 
it's argument, and opens it itself. It would be very tempting to re-use the 
code from ApplyBoxes itself, but i feel like if something changes in the source 
code in future versions, i'll have to start over again...

Also, how can i get the result of the training as binary data instead of 
specifying an output file (UNIX philosophy, anyone)?
i'm not sure if i will be able to do what i want to... Avoiding files. 
Especially when it will come to mftraining and friends...

Thanks,
Pierre.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to