*My stackoverflow question refers 
(https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used
 
<https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used>).*

Consider this <https://i.stack.imgur.com/usCwV.png> small png image 
depicting the word 'Account' in black on a white background.

For this ground-truth image the output differs between the following two 
Tesseract command-line operations, with (A) better than (B). (B) is 
required in order for me as the user to be have any hope of sensibly 
controlling Tesseract's large number of configuration parameters - but 
preferably with (A)'s excellent extraction performance.


*Case A (no config file):*

tesseract -v test.png test

tesseract 4.1.0
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : 
libwebp 1.0.3 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE
Tesseract Open Source OCR Engine v4.1.0 with Leptonica

cat test.txt

Account
^L


*Case B (using config file, which is obviously desirable, to avoid trying 
to discover the default parameters by brute force):*

tesseract --print-parameters > tess_default.cfg
tesseract -v test.png test test_default.cfg

ccot
^L      Page separator (default is form feed control character)

I believe the output should be the same in both cases, but it is not. 
*Q1. Why?* *Case A* is clearly more accurate in its output, but *Case B* is 
not very accurate.

*Q2. How does one otherwise discover the current configuration of Tesseract 
if not using --print-parameters?*

Thanks for all help.


Environment:

* **Tesseract Version**: 4.1.0
* **Commit Number**: [executed: brew install tesseract]
* **Platform**: macOS High Sierra 10.13.6 / Darwin redacted.office 17.7.0 
Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019; 
root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64

--ENDS----

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1f8290b8-6ffe-4610-bdf9-e7b336e64712%40googlegroups.com.

Reply via email to