You can also experiment with hocr and tsv output modes to see if they help.
On 14 Oct 2016 2:53 a.m., "fuzzy7k" wrote:
> Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped,
> in that it draws only one box around the "block" of text, instead of the
>
Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped,
in that it draws only one box around the "block" of text, instead of the
three that I was first getting. This is obviously the same as psm 6, but
psm 6 should not run column detection, which is something that I want
6 gives the exact same results as 3 (i.e. no column separation). 11 & 12
are essentially the same in that they pull text from left to right, but
with three times as many newlines.
On Thursday, October 13, 2016 at 8:21:09 AM UTC-4, shree wrote:
>
> Try psm 6, also 11, 12
>
>
Try psm 6, also 11, 12
https://github.com/tesseract-ocr/tesseract/issues/434
On 13 Oct 2016 1:13 p.m., "fuzzy7k" wrote:
> I tried psm 0-3
>
> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>
>> Which page segmentation mode (psm) did you try?
>>
>> On 12 Oct
I tried psm 0-3
On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>
> Which page segmentation mode (psm) did you try?
>
> On 12 Oct 2016 11:21 p.m., "fuzzy7k"
> wrote:
>
>> I have scanned some index pages that I would like to ocr for rapid
>> searching. I am
Which page segmentation mode (psm) did you try?
On 12 Oct 2016 11:21 p.m., "fuzzy7k" wrote:
> I have scanned some index pages that I would like to ocr for rapid
> searching. I am using tesseract from the command line. The problem is that
> tesseract ignores the whitespace
I have scanned some index pages that I would like to ocr for rapid
searching. I am using tesseract from the command line. The problem is that
tesseract ignores the whitespace between columns and merges everything
together, essentially fragmenting the contents. Using some debug output I
see
7 matches
Mail list logo