Hi, AFAICT tesseract OCR quality deteriorates a lot when being fed 'inverted colors', i.e. white text on black background. (Can't dig up the tesseract blog / article I first saw this mentioned and google fails me in this regard right this minute, sorry.)
Second, from what I gather from all the applications/code I've investigated which feed images to tesseract, the last stage is always a [type of] 'threshold' stage where text is converted to a simple black&white picture: tesseract expects black text on white background. Given your purple+yellow "image test" image, a simple threshold action very probably would render that as white text on black background, which is the wrong way around if you want to get the best performance from tesseract. Hence a potential solution vector would be: - find ways to 'preprocess' your images to ensure each is converted to black text on white background in a subsequent thresholding pass. (Do the thresholding yourself in your preprocess to have maximum control over the image you feed to tesseract.) (Quick initial thought: it might be good enough to count pixels with each hue, then find the two major 'bulges' in the color distribution and code a quick filter which assigns the hues in the least major hump to black and ones in the most major one to white. Another way would be to run a threshold filter and then do this counting on the threshold /output/: pixels there can only be either black or white as the threshold action outputs a monochrome image and thus the code would be extremely easy to count pixels and flip the colors if the black color count happens to be larger than the white color count. Just some rough idea, this.) - Quick google on 'tesseract white text black background' pops up this as the top entry for me: https://stackoverflow.com/questions/39002966/detect-white-characters-on-black-background-using-tesseract Did a quick scan of that one sounds like it might be good to check out further for you. HTH Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: [email protected] mobile: +31-6-11 120 978 -------------------------------------------------- On Thu, Oct 1, 2020 at 3:46 PM Jean-Marc Spaggiari <[email protected]> wrote: > Hi Fabian, > > Are you able to try by removing the camera picture on the left? Or it has > to stay there? Maybe you can split your picture into smaller one, by > looking for vertical delimiters? > > JM > > Le mercredi 30 septembre 2020 à 06 h 50 min 44 s UTC-4, > [email protected] a écrit : > >> Hello, >> >> i am currently working on a OCR for detecting text from some cropped >> region of interests. At most of the roi's it works fine, but for example in >> the attached image tesseract ignores 'Test'. I have tested different --psm >> modes. DPI looks fine to me aswell. >> >> - Any suggestions for further testing or preprocessing? >> - Should i try to provide a set of rois for tesseract to train on it? >> >> Thanks for your help! >> >> [image: cropped_roi_tesseract.png] >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/43d66ca1-10f9-40aa-ac02-5d9c8de2f598n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/43d66ca1-10f9-40aa-ac02-5d9c8de2f598n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fq_NWR_spRc2Qwtrh93Sa%2BwRtWigtKR5hto8N%2Bz3VFOoA%40mail.gmail.com.

