subject:"\[tesseract\-ocr\] Failure to recognize columns"

Re: [tesseract-ocr] Failure to recognize columns

2016-10-14 Thread ShreeDevi Kumar

You can also experiment with hocr and tsv output modes to see if they help. On 14 Oct 2016 2:53 a.m., "fuzzy7k" wrote: > Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped, > in that it draws only one box around the "block" of text, instead of the >

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k

Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped, in that it draws only one box around the "block" of text, instead of the three that I was first getting. This is obviously the same as psm 6, but psm 6 should not run column detection, which is something that I want

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k

6 gives the exact same results as 3 (i.e. no column separation). 11 & 12 are essentially the same in that they pull text from left to right, but with three times as many newlines. On Thursday, October 13, 2016 at 8:21:09 AM UTC-4, shree wrote: > > Try psm 6, also 11, 12 > >

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread ShreeDevi Kumar

Try psm 6, also 11, 12 https://github.com/tesseract-ocr/tesseract/issues/434 On 13 Oct 2016 1:13 p.m., "fuzzy7k" wrote: > I tried psm 0-3 > > On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote: >> >> Which page segmentation mode (psm) did you try? >> >> On 12 Oct

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k

I tried psm 0-3 On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote: > > Which page segmentation mode (psm) did you try? > > On 12 Oct 2016 11:21 p.m., "fuzzy7k" > wrote: > >> I have scanned some index pages that I would like to ocr for rapid >> searching. I am

Re: [tesseract-ocr] Failure to recognize columns

2016-10-12 Thread ShreeDevi Kumar

Which page segmentation mode (psm) did you try? On 12 Oct 2016 11:21 p.m., "fuzzy7k" wrote: > I have scanned some index pages that I would like to ocr for rapid > searching. I am using tesseract from the command line. The problem is that > tesseract ignores the whitespace

[tesseract-ocr] Failure to recognize columns

2016-10-12 Thread fuzzy7k

I have scanned some index pages that I would like to ocr for rapid searching. I am using tesseract from the command line. The problem is that tesseract ignores the whitespace between columns and merges everything together, essentially fragmenting the contents. Using some debug output I see

Re: [tesseract-ocr] Failure to recognize columns

Re: [tesseract-ocr] Failure to recognize columns

Re: [tesseract-ocr] Failure to recognize columns

Re: [tesseract-ocr] Failure to recognize columns

Re: [tesseract-ocr] Failure to recognize columns

Re: [tesseract-ocr] Failure to recognize columns

[tesseract-ocr] Failure to recognize columns

7 matches

Site Navigation

Mail list logo

Footer information