Please see https://github.com/tesseract-ocr/tesseract/issues/681#issuecomment-303027906
You can try changing those constants to see if you get any improvement. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 18, 2017 at 11:32 PM, Chris Hawley <[email protected]> wrote: > The file that i am running OCR on > > https://drive.google.com/file/d/0B-iKKP8eIvdgZkhObUVXUVJ1N28/ > view?usp=sharing > > Before anyone asks, it's part of the CIA's Crest Dataset. I noticed > tesseract seems to skip over some text. The command that I am using is > > E:\Tesseract\build\bin\Release\tesseract.exe --psm 1 --oem 1 > "D:\split\Folder 001\1946-06-21.tiff" test.txt > > The output is > > 21 June 1946 > > MEMORANDUM For SUPERVISING AGENT, > U. S. SECRET SERVICE, > WHITE Hous®. > > > > 1. - It is requested that a White House pass be issued to > Lieutenant General Hoyt S. VANDENBERG, Director of Central Intel- > > ligence. > > > > 2. - In connection with his official duties, it is necessary > for General Vandenberg to visit the White House frequently,. > > > > > > > > 3% His physical description is: > > Height =-- 6 feet. > Hair «-- _ @FAY , > Eyes -- _- blue. > > Enclosed herewith is his photograph. > > THOMAS F, CULLEN > Captain, USNR > Asgistant to the Director. > > > > if you notice, it skips over the "weight -- 165 lbs" line. I wasn't sure > if this qualified as a bug. Is there anything that I can do to improve the > results so that line is included? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU2e8n1A5Jvj7DrTP4gh2k8kr%3DqYOL9jxLXfr9fhiRiqQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

