1. --print-parameters is not designed to create config file. 2. There are init and not init variables, there could be variables also in language data, etc...
Zdenko št 5. 9. 2019 o 10:56 Jonathan Zwart <[email protected]> napísal(a): > *My stackoverflow question refers > (https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used > <https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used>).* > > Consider this <https://i.stack.imgur.com/usCwV.png> small png image > depicting the word 'Account' in black on a white background. > > For this ground-truth image the output differs between the following two > Tesseract command-line operations, with (A) better than (B). (B) is > required in order for me as the user to be have any hope of sensibly > controlling Tesseract's large number of configuration parameters - but > preferably with (A)'s excellent extraction performance. > > > *Case A (no config file):* > > tesseract -v test.png test > > tesseract 4.1.0 > leptonica-1.78.0 > libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 > : libwebp 1.0.3 : libopenjp2 2.3.1 > Found AVX2 > Found AVX > Found SSE > Tesseract Open Source OCR Engine v4.1.0 with Leptonica > > cat test.txt > > Account > ^L > > > *Case B (using config file, which is obviously desirable, to avoid trying > to discover the default parameters by brute force):* > > tesseract --print-parameters > tess_default.cfg > tesseract -v test.png test test_default.cfg > > ccot > ^L Page separator (default is form feed control character) > > I believe the output should be the same in both cases, but it is not. > *Q1. Why?* *Case A* is clearly more accurate in its output, but *Case B* is > not very accurate. > > *Q2. How does one otherwise discover the current configuration of > Tesseract if not using --print-parameters?* > > Thanks for all help. > > > Environment: > > * **Tesseract Version**: 4.1.0 > * **Commit Number**: [executed: brew install tesseract] > * **Platform**: macOS High Sierra 10.13.6 / Darwin redacted.office 17.7.0 > Darwin Kernel Version 17.7.0: Sun Jun 2 20:31:42 PDT 2019; > root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64 > > --ENDS---- > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/1f8290b8-6ffe-4610-bdf9-e7b336e64712%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/1f8290b8-6ffe-4610-bdf9-e7b336e64712%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yspjpgZ3J7k9Z1_vLzd5699FL%3DsQeLZoAT0m%2BEfSspBw%40mail.gmail.com.

