Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k
6 gives the exact same results as 3 (i.e. no column separation). 11 & 12 are essentially the same in that they pull text from left to right, but with three times as many newlines. On Thursday, October 13, 2016 at 8:21:09 AM UTC-4, shree wrote: > > Try psm 6, also 11, 12 > >

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k
Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped, in that it draws only one box around the "block" of text, instead of the three that I was first getting. This is obviously the same as psm 6, but psm 6 should not run column detection, which is something that I want

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread fuzzy7k
I tried psm 0-3 On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote: > > Which page segmentation mode (psm) did you try? > > On 12 Oct 2016 11:21 p.m., "fuzzy7k" > wrote: > >> I have scanned some index pages that I would like to ocr for rapid >> searching. I am

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread ShreeDevi Kumar
Try psm 6, also 11, 12 https://github.com/tesseract-ocr/tesseract/issues/434 On 13 Oct 2016 1:13 p.m., "fuzzy7k" wrote: > I tried psm 0-3 > > On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote: >> >> Which page segmentation mode (psm) did you try? >> >> On 12 Oct

[tesseract-ocr] Re: hOCR bbox viewer?

2016-10-13 Thread Zeth Weissman
Better late than never, but found this tool that will do what you want. http://www.primaresearch.org/tools/PAGEViewer You just need to rename your hocr or html file (depending on version of tesseract) to xml. On Sunday, October 6, 2013 at 3:26:58 PM UTC-4, matthew christy wrote: > > Does