Hello,
I have found the follow Tip for tesseract; but when I give this
parameter with -c /textord_min_linesize 3.25 in tesseract 4, I receive a
error message. What is wrong ?/
/
/
Example 3: Line Size
Command
/tesseract image.jpg outputfilename config/
Command Line Arguments
None
Config Settings
/textord_min_linesize 3.25/
Notes
* textord_min_linesize seems to have an affect on the line heights
detected by Tesseract when it performs the layout analysis on the
image. The default value for this setting is 1.25.
* When set to 3.25, the "broken" line problem in the original baseline
output is corrected. Lower settings (for example, 3.0) do not
correct the "broken" lines.
* This settings causes other character recognition errors.
* The text in the output that is highlighted in red is again correctly
contained on a single line.
* The words highlighted in blue include extra characters that are a
results of "noise" (specks and imperfections in the image). None of
these have corrected, but no new ones have appeared.
* Lines between "paragraphs" now appear in somewhat odd locations.
Again, there are NO lines between paragraphs on the source image.
* The garbage words at the end of the page do not appear.
* A small number of errors in individual words that appear in the
original output were corrected, a few other incorrect words changed
(but were still incorrect), a small number of correct words now are
incorrect. These have been highlighted in purple.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/0611f640-034f-3251-932f-e29e6fea4773%40skynet.be.
For more options, visit https://groups.google.com/d/optout.