[tesseract-ocr] PDF output not searchable within SumatraPDF

Chris Cameron Tue, 14 Oct 2014 23:46:11 -0700

This command:
$ tesseract.exe 18.jpg test

Gives me "test.txt", which has all the text from 18.jpg, as expected.


This command:
$ tesseract.exe 18.jpg test pdf

Gives me "test.pdf", which doesn't appear to have most of the sentences 
that exist in test.txt when opened in SumatraPDF. All the PDF text can be 
highlighted, but when doing a search from within the PDF, only fragments of 
sentences are found. Opening this same file in Adobe Reader, all text can 
be found with the find function.


My environment:
$ tesseract.exe -v
tesseract 3.04.00
 leptonica-1.71
  libjpeg 8d : libpng 1.5.18 : libtiff 4.0.3 : zlib 1.2.8

SumatraPDF v2.5.2

Adobe Reader 11.0.07


Can someone help me out with why this might be happening?


Thanks,
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9653f6bd-5251-42b5-a5e1-592d85c26c5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] PDF output not searchable within SumatraPDF

Reply via email to