devTess be careful with coffee, don't overdose )) > Q1 > Init(datapath, language, OcrEngineMode); > What is the normal setting of OcrEngineMode? Currently OEM_OcrEngineMode = TESSERACT_ONLY would be sufficient for all cases.
> Q2: which of the following is USED In normal running mode of > tessearct.exe to recognize text The values of the variables you can see within the code of Recognize() (e.g. tesseract_->tessedit_resegment_from_boxes) are often loaded from config files. Usually recognition runs with no config files at all, so you can assume all these variables to be "false". In that way you can examine the control paths and figure out what procedures get called at the recognition stage. > Q3: which of the following is USED In normal running mode of > tessearct.exe to recognize text You meant "to train" - copy-paste. Training is a 2-stage process: 1) Making box files. Requires two config files: "batch.nochop" and "makebox" 2) Generation of .tr files. Needs "nobatch" and "box.train" You can find the above configs inside the tessdata/configs and tessdata/tessconfigs directories in Tess's distribution. Check these files and you'll understand what usually happens while training. Plain old step-by-step debugging is also of use )) Warm regards, Dmitry Silaev On Tue, Feb 8, 2011 at 6:44 PM, devTess <jim...@googlemail.com> wrote: > > Hi Dimitry, with the guidelines provided from you, I prepared a strong > cup of coffee and start reading the top part of baseapi.h > > Q1 > Init(datapath, language, OcrEngineMode); > What is the normal setting of OcrEngineMode? > > I try to use the :Recognize(ETEXT_DESC* monitor) method. > >>> There are two PARTS to the Recognize method > > Part ONE: > Q2: which of the following is USED In normal running mode of > tessearct.exe to recognize text > > if (tesseract_->tessedit_resegment_from_line_boxes) > page_res_ = tesseract_->ApplyBoxes(*input_file_, true, > block_list_); > else if (tesseract_->tessedit_resegment_from_boxes) > page_res_ = tesseract_->ApplyBoxes(*input_file_, false, > block_list_); > else > page_res_ = new PAGE_RES(block_list_, &tesseract_- > >prev_word_best_choice_); <<My guess> > if (tesseract_->tessedit_make_boxes_from_boxes) { > tesseract_->CorrectClassifyWords(page_res_); > return 0; > } > > Part TWO: > Q3: which of the following is USED In normal running mode of > tessearct.exe to recognize text > if (tesseract_->interactive_mode) { > tesseract_->pgeditor_main(rect_width_, rect_height_, page_res_); > // The page_res is invalid after an interactive session, so > cleanup > // in a way that lets us continue to the next page without > crashing. > delete page_res_; > page_res_ = NULL; > return -1; > } else if (tesseract_->tessedit_train_from_boxes) { > tesseract_->ApplyBoxTraining(*output_file_, page_res_); > } else if (tesseract_->tessedit_ambigs_training) { > FILE *training_output_file = tesseract_- > >init_recog_training(*input_file_); > // OCR the page segmented into words by tesseract. > tesseract_->recog_training_segmented( > *input_file_, page_res_, monitor, training_output_file); > fclose(training_output_file); > } else { > // Now run the main recognition. > tesseract_->recog_all_words(page_res_, monitor, NULL, NULL, 0); > <<My guess> > } > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com. > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.