The "usable" data is normally in german. I have to generate a test job with non-sense data. Takes some time.
Am Sonntag, 7. Juni 2015 19:10:38 UTC+2 schrieb Nicolas Nickisch: > > I try to use tesseract 3.03 to OCR scanned pages. > > In many cases 1 scan job contains many jobs and they are separated by > feeding a special spearator page between the jobs to separate them. > This page contains only 12 "T" on the left top of the page (and a second > line head down at the right bottom). > > I tried a lot, but it seems that tesseract completely ignores this text, > even the scan looks great. That page is completely empty! The rest of the > OCRed text looks also good. > > The idea is not mine, but i have to use this kind of separation. > > Is there something i can do to improve recognition of this sepcial text ? > > > Nicolas Nickisch > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3b94b85e-62aa-4d93-a3bd-69a4ba1a0fbd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

