6 gives the exact same results as 3 (i.e. no column separation). 11 & 12
are essentially the same in that they pull text from left to right, but
with three times as many newlines.
On Thursday, October 13, 2016 at 8:21:09 AM UTC-4, shree wrote:
>
> Try psm 6, also 11, 12
>
>
Going back to psm 3, I did find that textord_tabfind_find_tables 0 helped,
in that it draws only one box around the "block" of text, instead of the
three that I was first getting. This is obviously the same as psm 6, but
psm 6 should not run column detection, which is something that I want
I tried psm 0-3
On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>
> Which page segmentation mode (psm) did you try?
>
> On 12 Oct 2016 11:21 p.m., "fuzzy7k"
> wrote:
>
>> I have scanned some index pages that I would like to ocr for rapid
>> searching. I am
Try psm 6, also 11, 12
https://github.com/tesseract-ocr/tesseract/issues/434
On 13 Oct 2016 1:13 p.m., "fuzzy7k" wrote:
> I tried psm 0-3
>
> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>
>> Which page segmentation mode (psm) did you try?
>>
>> On 12 Oct
Better late than never, but found this tool that will do what you want.
http://www.primaresearch.org/tools/PAGEViewer
You just need to rename your hocr or html file (depending on version of
tesseract) to xml.
On Sunday, October 6, 2013 at 3:26:58 PM UTC-4, matthew christy wrote:
>
> Does
5 matches
Mail list logo