I tried psm 0-3

On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>
> Which page segmentation mode (psm) did you try?
>
> On 12 Oct 2016 11:21 p.m., "fuzzy7k" <kva...@gmail.com <javascript:>> 
> wrote:
>
>> I have scanned some index pages that I would like to ocr for rapid 
>> searching. I am using tesseract from the command line. The problem is that 
>> tesseract ignores the whitespace between columns and merges everything 
>> together, essentially fragmenting the contents. Using some debug output I 
>> see that no "columns" are detected. Probably more important is that three 
>> "blocks" are detected, one around the first and last line, and one 
>> encompassing everything in between. Is there a way to train block 
>> detection, or some parameters that I can tweak to optimize this?
>>
>> I have attached the image merely as an abstract representation of the 
>> text layout to show the types of columns I am dealing with. Ideally, it 
>> would also be nice to know if tab stops can be trained and used to oneline 
>> each individual topic, which I could do postprocess if I could get tabstops 
>> printed.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to