https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr

For the ppa

On 25-Aug-2017 12:45 AM, "ShreeDevi Kumar" <[email protected]> wrote:

> There is an unofficial ppa package available with latest code, if you do
> not want to build it.
>
> -- Excuse the brevity, msg sent from phone.
>
> On 25-Aug-2017 12:41 AM, "ShreeDevi Kumar" <[email protected]> wrote:
>
>> You can try building latest GitHub source for 4.0alpha and test with the
>> best/eng.traineddata from the tessdata repository.
>>
>> -- Excuse the brevity, msg sent from phone.
>>
>> On 25-Aug-2017 12:36 AM, "Clinton Graham" <[email protected]> wrote:
>>
>>> Do you have any simple suggestions for improving OCR quality where
>>> tesseract is missing single character words like "a" and "I"?
>>>
>>> I'm using the default packages available in Ubuntu:
>>> tesseract 3.03
>>>  leptonica-1.70
>>>   libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib
>>> 1.2.8 : webp 0.4.0
>>>
>>> I've also tried updating Ubuntu, building later 3.x sources:
>>> tesseract 3.05.01
>>>  leptonica-1.74.4
>>>   libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 :
>>> zlib 1.2.8
>>>
>>> I'm using a command line run of simply:
>>> tesseract -psm 1 -l eng $f $f pdf
>>>
>>> I've also tried -psm 6 based on another forum post (though some of my
>>> input will be multicolumn).
>>>
>>> In whatever case, the first paragraph of the my TIFF (attached) is
>>> consistently read without instances of single character words:
>>>
>>> Honors Award {Presentation to Robert H. Ivy, M.D., D.D.S., Sc.D.,
>>>> F_‘.A.C.S. At the business meeting .of the American Cleft Palate
>>>> Association on May 6, 1961 in Montreal, Canada, an Honors and Awards
>>>> Committee was established and its duties were set forth. The Executive
>>>> Committee then selected Dr. Robert Ivy to be the first recipient of an
>>>> Honors Award. An HOnors and Awards Committee was then selected by the
>>>> President; serve as the current chairman. It therefore becomes personal
>>>> honor and privilege to me to be able to present this first award to good
>>>> friend. Dr. Ivy has had long and brilliant career in the field of plastic
>>>> surgery with particular interest in the cleft lip and palate patient. It
>>>> will be possible for us to mention only very few of Dr. Ivy’s many
>>>> accomplishments in our allotted time here today. would, therefore, like to
>>>> recommend to you two publications which will give you more insight into the
>>>> life of our honored guest.
>>>>
>>>
>>> I'm hoping this sample and description is also representative of other
>>> dropped characters, such as single numerals in pagination and single
>>> initials in some instances.
>>>
>>> Unfortunately, I don't have a lot of time to devote to this project, so
>>> anything easy and obvious which I'm missing?
>>>
>>> Thanks,
>>>
>>> - Clinton Graham
>>>
>>> Systems Developer
>>>
>>> University of Pittsburgh | University Library System
>>>
>>> 412-383-1057 <(412)%20383-1057>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/e0b62d2b-2e27-4732-b4fe-8d5b78c52d98%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e0b62d2b-2e27-4732-b4fe-8d5b78c52d98%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXXfSJJ%2BE7p-RZ71hhmhiK%3DFR0Q0Z2P72Nw4URyJQ9OwQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to