On Thu, Mar 8, 2012 at 11:11 AM, Dmitri Silaev <[email protected]> wrote:
> As for existence and effects of specific parameters, currently I don't
> any other way to find it out but digging in Tesseract's code.

If you are on Windows, I wrote this section on TCC/LE [1] that talks
about how you can use it's "ffind" command to display all (most?
some?) configuration parameters defined in the tesseract-ocr source
files (which is not the same thing as those parameters actually being
*used* to do anything). It also mentions how you can do something
similar with Visual Studio 2008, or the bash shell on Linux.

You can also put the following in a config file called, for example,
config-write-params.txt:

   tessedit_write_params_to_file   currentparams.txt
   tessdata_manager_debug_level    1

(NOTE: this file *MUST* use unix style line endings, that is, only a
Linefeed character, *NOT* the window's convention: Carriage Return,
Linefeed).

Then do:

   tesseract.exe eurotext.tif eurotext config-write-params.txt

You'll see:

   Wrote parameters to currentparams.txt
   Loading Tesseract/Cube with tessedit_ocr_engine_mode 0
   Loaded unicharset
   Loaded ambigs
   Loaded language 'eng' as main language
   Tesseract Open Source OCR Engine v3.02 with Leptonica

And looking at the newly created currentparams.txt you'll see something like:

   textord_debug_tabfind        0
   textord_debug_bugs   0
   textord_testregion_left      -1
   ...
   textord_noise_hfract 0.015625
   textord_noise_rowratio       6
   textord_blshift_maxshift     0
   textord_blshift_xfraction    9.99

(over 660 lines in my case). This file unfortunately is missing the
Description string that is listed in the source files, but otherwise
it gives a pretty good idea of what can be set. Searching the source
for a particular param will then provide insight into what it does.
For example with TCC/LE, try searching the source for
"tessedit_write_params_to_file":

   ffind /s/v/t"tessedit_write_params_to_file" *.cpp

which gives:

   ---- TesseractSVN\ccmain\tessedit.cpp
     if (((STRING &)tessedit_write_params_to_file).length() > 0) {
       FILE *params_file = fopen(tessedit_write_params_to_file.string(), "wb");
                   tessedit_write_params_to_file.string());
                 tessedit_write_params_to_file.string());

   ---- TesseractSVN\ccmain\tesseractclass.cpp
       STRING_MEMBER(tessedit_write_params_to_file, "",

     5 lines in      2 files

Opening ccmain\tessedit.cpp, we then see the following:

  if (((STRING &)tessedit_write_params_to_file).length() > 0) {
    FILE *params_file = fopen(tessedit_write_params_to_file.string(), "wb");
    if (params_file != NULL) {
      ParamUtils::PrintParams(params_file, this->params());
      fclose(params_file);
      if (tessdata_manager_debug_level > 0) {
        tprintf("Wrote parameters to %s\n",
                tessedit_write_params_to_file.string());
      }
    } else {
      tprintf("Failed to open %s for writing params.\n",
              tessedit_write_params_to_file.string());
    }
  }

and ccmain\tesseractclass.cpp shows:

  STRING_VAR_H(tessedit_write_params_to_file, "",
               "Write all parameters to the given file.");

[1] http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/tools.html#id2

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to