negative

On Friday, October 14, 2016 at 3:29:53 AM UTC-4, shree wrote:
>
> You can also experiment with hocr and tsv output modes to see if they help.
>
> On 14 Oct 2016 2:53 a.m., "fuzzy7k" <kva...@gmail.com <javascript:>> 
> wrote:
>
>> Going back to psm 3, I did find that textord_tabfind_find_tables 0 
>> helped, in that it draws only one box around the "block" of text, instead 
>> of the three that I was first getting. This is obviously the same as psm 6, 
>> but psm 6 should not run column detection, which is something that I want 
>> unless I can get tesseract to draw "blocks" vertically around the 
>> individual columns.
>>
>> On Thursday, October 13, 2016 at 8:30:05 PM UTC-4, fuzzy7k wrote:
>>>
>>> 6 gives the exact same results as 3 (i.e. no column separation). 11 & 12 
>>> are essentially the same in that they pull text from left to right, but 
>>> with three times as many newlines.
>>>
>>> On Thursday, October 13, 2016 at 8:21:09 AM UTC-4, shree wrote:
>>>>
>>>> Try psm 6, also 11, 12
>>>>
>>>> https://github.com/tesseract-ocr/tesseract/issues/434
>>>>
>>>> On 13 Oct 2016 1:13 p.m., "fuzzy7k" <kva...@gmail.com> wrote:
>>>>
>>>>> I tried psm 0-3
>>>>>
>>>>> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>>>>>
>>>>>> Which page segmentation mode (psm) did you try?
>>>>>>
>>>>>> On 12 Oct 2016 11:21 p.m., "fuzzy7k" <kva...@gmail.com> wrote:
>>>>>>
>>>>>>> I have scanned some index pages that I would like to ocr for rapid 
>>>>>>> searching. I am using tesseract from the command line. The problem is 
>>>>>>> that 
>>>>>>> tesseract ignores the whitespace between columns and merges everything 
>>>>>>> together, essentially fragmenting the contents. Using some debug output 
>>>>>>> I 
>>>>>>> see that no "columns" are detected. Probably more important is that 
>>>>>>> three 
>>>>>>> "blocks" are detected, one around the first and last line, and one 
>>>>>>> encompassing everything in between. Is there a way to train block 
>>>>>>> detection, or some parameters that I can tweak to optimize this?
>>>>>>>
>>>>>>> I have attached the image merely as an abstract representation of 
>>>>>>> the text layout to show the types of columns I am dealing with. 
>>>>>>> Ideally, it 
>>>>>>> would also be nice to know if tab stops can be trained and used to 
>>>>>>> oneline 
>>>>>>> each individual topic, which I could do postprocess if I could get 
>>>>>>> tabstops 
>>>>>>> printed.
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5b4800f9-cead-4959-9260-52e98ee596b7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/6a866b2d-e18b-4ef2-89ab-5e4627cd3d06%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/e0ab1c62-de29-4042-b622-a3a06827b057%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/e0ab1c62-de29-4042-b622-a3a06827b057%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31bc93d0-863b-4d35-b608-9dba08726d53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to