It's less than elegant, but works
convert -draw "line 800,0 800,1" -draw "line 1500,0 1500,1"
index-3.pnm x.pnm
On Sunday, October 23, 2016 at 9:35:21 PM UTC-4, fuzzy7k wrote:
>
> Well, I have used ocrfeeder to draw up columns individually, but that is a
> l
into
how to do that with convert.
I like the histogram idea. That sounds like a good feature request.
On Saturday, October 15, 2016 at 9:49:20 PM UTC-4, Tom Morris wrote:
>
> On Wednesday, October 12, 2016 at 5:21:17 PM UTC-4, fuzzy7k wrote:
>>
>> I have scanned some index pages
hat I want
unless I can get tesseract to draw "blocks" vertically around the
individual columns.
On Thursday, October 13, 2016 at 8:30:05 PM UTC-4, fuzzy7k wrote:
>
> 6 gives the exact same results as 3 (i.e. no column separation). 11 & 12
> are essentially the same in that
tps://github.com/tesseract-ocr/tesseract/issues/434
>
> On 13 Oct 2016 1:13 p.m., "fuzzy7k" <kva...@gmail.com >
> wrote:
>
>> I tried psm 0-3
>>
>> On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>>>
>>> Which page segmen
I tried psm 0-3
On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote:
>
> Which page segmentation mode (psm) did you try?
>
> On 12 Oct 2016 11:21 p.m., "fuzzy7k" <kva...@gmail.com >
> wrote:
>
>> I have scanned some index pages that I would l
I have scanned some index pages that I would like to ocr for rapid
searching. I am using tesseract from the command line. The problem is that
tesseract ignores the whitespace between columns and merges everything
together, essentially fragmenting the contents. Using some debug output I
see
My earlier successes were definitely font related. Use a blacklist, or
whitelist
-c tessedit_char_blacklist=fifl
https://groups.google.com/d/topic/tesseract-ocr/jO_4ZMMK9xw/discussion
On Saturday, September 3, 2016 at 1:45:21 PM UTC-4, fuzzy7k wrote:
>
> It's a language thing:
It's a language thing: https://en.wikipedia.org/wiki/Typographic_ligature
Try specifying a specific language?
This parameter seems like a possible association (due to the description
containing glyph):
segment_penalty_dict_nonword1.25Score multiplier for glyph fragment
segmentations
I found the function that puts everything on the table, with regard to the
scrollview blob debug window...
ccstruct/blobbox.cpp:
ScrollView::Color BLOBNBOX::TextlineColor(BlobRegionType region_type,
BlobTextFlowType flow_type) {
switch (region_type) {
I found the function that puts everything on the table, with regard to the
scrollview blob debug window...
ccstruct/blobbox.cpp:
ScrollView::Color BLOBNBOX::TextlineColor(BlobRegionType region_type,
BlobTextFlowType flow_type) {
switch (region_type) {
Ever so frequently I will get a page where one line on the whole page is
not recognized. I think I've tracked the problem to blob recognition, but
don't know where to go from here. The attached images are of an index page
and they are obtained using textord_tabfind_show_images. The line that is
11 matches
Mail list logo