https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
For the ppa On 25-Aug-2017 12:45 AM, "ShreeDevi Kumar" <[email protected]> wrote: > There is an unofficial ppa package available with latest code, if you do > not want to build it. > > -- Excuse the brevity, msg sent from phone. > > On 25-Aug-2017 12:41 AM, "ShreeDevi Kumar" <[email protected]> wrote: > >> You can try building latest GitHub source for 4.0alpha and test with the >> best/eng.traineddata from the tessdata repository. >> >> -- Excuse the brevity, msg sent from phone. >> >> On 25-Aug-2017 12:36 AM, "Clinton Graham" <[email protected]> wrote: >> >>> Do you have any simple suggestions for improving OCR quality where >>> tesseract is missing single character words like "a" and "I"? >>> >>> I'm using the default packages available in Ubuntu: >>> tesseract 3.03 >>> leptonica-1.70 >>> libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib >>> 1.2.8 : webp 0.4.0 >>> >>> I've also tried updating Ubuntu, building later 3.x sources: >>> tesseract 3.05.01 >>> leptonica-1.74.4 >>> libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : >>> zlib 1.2.8 >>> >>> I'm using a command line run of simply: >>> tesseract -psm 1 -l eng $f $f pdf >>> >>> I've also tried -psm 6 based on another forum post (though some of my >>> input will be multicolumn). >>> >>> In whatever case, the first paragraph of the my TIFF (attached) is >>> consistently read without instances of single character words: >>> >>> Honors Award {Presentation to Robert H. Ivy, M.D., D.D.S., Sc.D., >>>> F_‘.A.C.S. At the business meeting .of the American Cleft Palate >>>> Association on May 6, 1961 in Montreal, Canada, an Honors and Awards >>>> Committee was established and its duties were set forth. The Executive >>>> Committee then selected Dr. Robert Ivy to be the first recipient of an >>>> Honors Award. An HOnors and Awards Committee was then selected by the >>>> President; serve as the current chairman. It therefore becomes personal >>>> honor and privilege to me to be able to present this first award to good >>>> friend. Dr. Ivy has had long and brilliant career in the field of plastic >>>> surgery with particular interest in the cleft lip and palate patient. It >>>> will be possible for us to mention only very few of Dr. Ivy’s many >>>> accomplishments in our allotted time here today. would, therefore, like to >>>> recommend to you two publications which will give you more insight into the >>>> life of our honored guest. >>>> >>> >>> I'm hoping this sample and description is also representative of other >>> dropped characters, such as single numerals in pagination and single >>> initials in some instances. >>> >>> Unfortunately, I don't have a lot of time to devote to this project, so >>> anything easy and obvious which I'm missing? >>> >>> Thanks, >>> >>> - Clinton Graham >>> >>> Systems Developer >>> >>> University of Pittsburgh | University Library System >>> >>> 412-383-1057 <(412)%20383-1057> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/e0b62d2b-2e27-4732-b4fe-8d5b78c52d98%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/e0b62d2b-2e27-4732-b4fe-8d5b78c52d98%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXXfSJJ%2BE7p-RZ71hhmhiK%3DFR0Q0Z2P72Nw4URyJQ9OwQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

