1.  --print-parameters  is not designed to create config file.
2. There are init and not init variables, there could be variables also in
language data, etc...

Zdenko


št 5. 9. 2019 o 10:56 Jonathan Zwart <[email protected]>
napísal(a):

> *My stackoverflow question refers
> (https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used
> <https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used>).*
>
> Consider this <https://i.stack.imgur.com/usCwV.png> small png image
> depicting the word 'Account' in black on a white background.
>
> For this ground-truth image the output differs between the following two
> Tesseract command-line operations, with (A) better than (B). (B) is
> required in order for me as the user to be have any hope of sensibly
> controlling Tesseract's large number of configuration parameters - but
> preferably with (A)'s excellent extraction performance.
>
>
> *Case A (no config file):*
>
> tesseract -v test.png test
>
> tesseract 4.1.0
>  leptonica-1.78.0
>   libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11
> : libwebp 1.0.3 : libopenjp2 2.3.1
>  Found AVX2
>  Found AVX
>  Found SSE
> Tesseract Open Source OCR Engine v4.1.0 with Leptonica
>
> cat test.txt
>
> Account
> ^L
>
>
> *Case B (using config file, which is obviously desirable, to avoid trying
> to discover the default parameters by brute force):*
>
> tesseract --print-parameters > tess_default.cfg
> tesseract -v test.png test test_default.cfg
>
> ccot
> ^L      Page separator (default is form feed control character)
>
> I believe the output should be the same in both cases, but it is not.
> *Q1. Why?* *Case A* is clearly more accurate in its output, but *Case B* is
> not very accurate.
>
> *Q2. How does one otherwise discover the current configuration of
> Tesseract if not using --print-parameters?*
>
> Thanks for all help.
>
>
> Environment:
>
> * **Tesseract Version**: 4.1.0
> * **Commit Number**: [executed: brew install tesseract]
> * **Platform**: macOS High Sierra 10.13.6 / Darwin redacted.office 17.7.0
> Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019;
> root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
>
> --ENDS----
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1f8290b8-6ffe-4610-bdf9-e7b336e64712%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/1f8290b8-6ffe-4610-bdf9-e7b336e64712%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yspjpgZ3J7k9Z1_vLzd5699FL%3DsQeLZoAT0m%2BEfSspBw%40mail.gmail.com.

Reply via email to