The file that i am running OCR on https://drive.google.com/file/d/0B-iKKP8eIvdgZkhObUVXUVJ1N28/view?usp=sharing
Before anyone asks, it's part of the CIA's Crest Dataset. I noticed tesseract seems to skip over some text. The command that I am using is E:\Tesseract\build\bin\Release\tesseract.exe --psm 1 --oem 1 "D:\split\Folder 001\1946-06-21.tiff" test.txt The output is 21 June 1946 MEMORANDUM For SUPERVISING AGENT, U. S. SECRET SERVICE, WHITE Hous®. 1. - It is requested that a White House pass be issued to Lieutenant General Hoyt S. VANDENBERG, Director of Central Intel- ligence. 2. - In connection with his official duties, it is necessary for General Vandenberg to visit the White House frequently,. 3% His physical description is: Height =-- 6 feet. Hair «-- _ @FAY , Eyes -- _- blue. Enclosed herewith is his photograph. THOMAS F, CULLEN Captain, USNR Asgistant to the Director. if you notice, it skips over the "weight -- 165 lbs" line. I wasn't sure if this qualified as a bug. Is there anything that I can do to improve the results so that line is included? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

