RE: [tesseract-ocr] Re: Works perfectly...except skips several lines

2016-12-02 Thread Art Rhyno .
The source image is really large, maybe try downsizing it to 800x450 or so. art From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com] On Behalf Of S Sent: Friday, December 02, 2016 5:43 PM To: tesseract-ocr Subject: [tesseract-ocr] Re:

[tesseract-ocr] Re: Segmentation without orientation-detection -- Shouldn't "-psm 3" skip OSD?

2016-12-02 Thread S
Strangely, I've had some luck by removing the "-psm 3" argument and using "-c tessedit_pageseg_mode=3" instead. There's no more output from OSD, and I'm getting mostly good hOCR from the problem pages. As a side note, it looks like the output from tesseract --print-parameters may be out of

[tesseract-ocr] Re: Works perfectly...except skips several lines

2016-12-02 Thread S
Just a guess, but it looks like the baseline / text angle isn't consistent on those omitted lines. E.g. in "*S*ome of the resistance *d*uring," the bottom of the *S* is noticeably higher than the bottom of the *d*, but by last three words, there's no noticeable slope. By the "no replacements"

[tesseract-ocr] Re: Works perfectly...except skips several lines

2016-12-02 Thread Andrew J Freyer
I can confirm I am experiencing the same issue described above. Entire lines in (what should be) very readable images are skipped consistently. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving

Re: [tesseract-ocr] Re: Tesseract Trainer GUI for GNU/Linux

2016-12-02 Thread Nalin Linux
On Thursday, November 10, 2016 at 1:33:49 PM UTC+5:30, peiman F. wrote: > > is this work on cube and rtl languages!?​ > > Will be enabled soon. Latest updates listed below 1 Training with font_properties enabled 2 Training Image zoom strengthened 3 Dictionary editing enabled 4 progress-bar

Re: [tesseract-ocr] Anybody can help on how to recognize this kind of image?

2016-12-02 Thread Allistair
I did a lot of work experimenting with trying to recognise this kind of text and never got it to a satisfactory level - try Google Cloud Vision. On 1 December 2016 at 23:44, Ni Min wrote: > I have download all eng related training data from github and tried with > the

[tesseract-ocr] Anybody can help on how to recognize this kind of image?

2016-12-02 Thread Ni Min
I have download all eng related training data from github and tried with the following command but it can't recognize successfully. I am new to tesseract. anybody can give some hint? thanks tesseract --tessdata-dir ~/tessdata test11.jpg result -l eng -psm 3 -- You received this message

[tesseract-ocr] Segmentation without orientation-detection -- Shouldn't "-psm 3" skip OSD?

2016-12-02 Thread S
*The immediate problem:* I'm using tesseract (3.04.1) via command-line to generate hOCR as follows: tesseract files/test-cases0001.tif files/test1-psm3 -l eng -psm 3 hocr But I'm seeing output that seems to be from OSD: > OSD: Weak margin (2.88) for 29022 blob text block, but using orientation