Try psm 6, also 11, 12

https://github.com/tesseract-ocr/tesseract/issues/434

On 13 Oct 2016 1:13 p.m., "fuzzy7k" <[email protected]> wrote:

> I tried psm 0-3
>
> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>
>> Which page segmentation mode (psm) did you try?
>>
>> On 12 Oct 2016 11:21 p.m., "fuzzy7k" <[email protected]> wrote:
>>
>>> I have scanned some index pages that I would like to ocr for rapid
>>> searching. I am using tesseract from the command line. The problem is that
>>> tesseract ignores the whitespace between columns and merges everything
>>> together, essentially fragmenting the contents. Using some debug output I
>>> see that no "columns" are detected. Probably more important is that three
>>> "blocks" are detected, one around the first and last line, and one
>>> encompassing everything in between. Is there a way to train block
>>> detection, or some parameters that I can tweak to optimize this?
>>>
>>> I have attached the image merely as an abstract representation of the
>>> text layout to show the types of columns I am dealing with. Ideally, it
>>> would also be nice to know if tab stops can be trained and used to oneline
>>> each individual topic, which I could do postprocess if I could get tabstops
>>> printed.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU5LPcbcyiW4D-z5_uSY%2BLVUeRNTGniwn1%2BS26YLTPmGw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to