I think your whole document needs enough surrounding margin - I found the empty 
page issue when my text was too close to the page edges. In your first image 
you have this but not your second.

Sent from my iPhone

> On 26 Oct 2015, at 18:30, Daniel Kraft <[email protected]> wrote:
> 
> Hi all!
> 
> I've just started to experiment with tesseract (and OCR in general).  I would 
> like to use it for reading sequences of numbers from pictures taken off an 
> old screen.  I've trained tesseract to my situation, including the particular 
> font used on the screen and only numbers as characters.  Recognition works 
> usually very well, with not a single mistake (e. g., confusing 0 with 8 or 1 
> with 7) after training.
> 
> However, sometimes tesseract simply refuses to recognise *any* content at 
> all, or only recognises text starting at some line half way through the 
> picture.  I found [1], which seems to be related.  However, resizing the 
> image canvas does not help me in my situation (see attachments and below).
> 
>   [1] https://groups.google.com/forum/#!topic/tesseract-ocr/eM7vClhtgw8
> 
> I've attached two images including the resulting text output (which cannot be 
> reproduced in this quality without training).  The pictures are based on 
> photographs but have been preprocessed already to improve contrast.  I don't 
> really see much of a difference in the visual quality between the "Failing" 
> and "Working" image, which makes me wonder why tesseract only outputs the 
> last lines of Failing while it gives perfect results (except for spurious 
> line breaks) in Working.  Any ideas what the issue could be?  Both images 
> have been created in the same way, with the same preprocessing parameters and 
> so on.
> 
> Thanks a lot!  Yours,
> Daniel
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/73d8219e-933d-478a-bc71-40394f612e37%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <Failing.png>
> <Failing.txt>
> <Working.png>
> <Working.txt>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/39285D0E-CA6C-43F4-9624-1F29B79A919F%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to