Please see
https://github.com/tesseract-ocr/tesseract/issues/681#issuecomment-303027906

You can try changing those constants to see if you get any improvement.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jul 18, 2017 at 11:32 PM, Chris Hawley <[email protected]> wrote:

> The file that i am running OCR on
>
> https://drive.google.com/file/d/0B-iKKP8eIvdgZkhObUVXUVJ1N28/
> view?usp=sharing
>
> Before anyone asks, it's part of the CIA's Crest Dataset. I noticed
> tesseract seems to skip over some text. The command that I am using is
>
> E:\Tesseract\build\bin\Release\tesseract.exe --psm 1 --oem 1
>  "D:\split\Folder 001\1946-06-21.tiff" test.txt
>
> The output is
>
> 21 June 1946
>
> MEMORANDUM For SUPERVISING AGENT,
> U. S. SECRET SERVICE,
> WHITE Hous®.
>
>
>
> 1. - It is requested that a White House pass be issued to
> Lieutenant General Hoyt S. VANDENBERG, Director of Central Intel-
>
> ligence.
>
>
>
> 2. - In connection with his official duties, it is necessary
> for General Vandenberg to visit the White House frequently,.
>
>
>
>
>
>
>
> 3% His physical description is:
>
> Height =-- 6 feet.
> Hair «-- _ @FAY ,
> Eyes -- _- blue.
>
> Enclosed herewith is his photograph.
>
> THOMAS F, CULLEN
> Captain, USNR
> Asgistant to the Director.
>
>
>
> if you notice, it skips over the "weight -- 165 lbs" line. I wasn't sure
> if this qualified as a bug. Is there anything that I can do to improve the
> results so that line is included?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ef8c2b5c-0f42-4c6e-9d22-1e8fd821571e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU2e8n1A5Jvj7DrTP4gh2k8kr%3DqYOL9jxLXfr9fhiRiqQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to