I just did more testing.

My one word or single character image works with
-psm 7
-psm 8

my two or three lines of text image works with the default of
-psm 3
as well as
-psm 4

They both seem to work with
-psm 6

I may have to go with 6 even though my three line test with different
font sizes should be done with 4 based on it's description.

I feel it's a bug that 3 and 4 can't reliably handle simpler content.
To get the most out of Tesseract, I must analyze the segmentation?!

That is why I had to go through the trouble of compiling leptonica;
so that tesseract is smart enough that I don't have to re-invent the wheel.


It seems that it's failing at the segmentation stage. If it finds nothing
it could try again automatically with a more primitive setting. That is
way more efficient than my process spawning tesseract twice as often.

    thanks
    scott

On Thursday, April 21, 2016 at 4:21:47 AM UTC-7, zdenop wrote:
>
> Please read the wiki 
> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method
>
> Zdenko
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e9f5cb1a-374f-49b6-82ef-795b009e0180%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to