I just found this: https://www.quora.com/How-do-I-fill-holes-in-image-using-image-processing/answer/V-Sri-Chakra-Kumar
Il giorno mer 8 mag 2019 alle ore 09:57 Lorenzo Bolzani <[email protected]> ha scritto: > Hi, > you can try a few things, but you need to write a small script (python, > etc.) or use imagemagick. I suggest to first try with gimp, find what works > best, and then write the code. You want dark text on clear background. > > For white text on red: > > 1. Invert the image. Desaturate. Increase contrast. > > 2. split the image in RGB channels and use the one that looks better (red > probably). Also try to decompose in HSV and see if S or V looks good. From > gimp do: Colors -> components -> decompose. > > 3. invert the image and try thresholding (OTSU, etc.) > > With a little programming you can identify and isolate black regions from > white ones, but I do not know if this is something you want to do. > > > Post the image if this does not help. > > > Lorenzo > > Il giorno mer 8 mag 2019 alle ore 03:07 Jason <[email protected]> ha > scritto: > >> I have a problem with the current tesseract. I have documents that have >> sections of varying background and text colors. Ive read that tesseract v3 >> was white/black invariant and it didn't matter if I had white text on red >> background. But now it matters. The problem is, other parts in the same >> image are black text on white background. Tesseract 4 fails to identify the >> white text on red background at all. >> >> I have tried inverting the image colors so red (0xFF0000) becomes cyan >> (0x00FFFF) and the white text (0xFFFFFF) becomes black (0x000000). I then >> take the highest confidence text for the region. This improves some >> scenarios, but for the red/white scenario, does not work. >> >> Questions: >> 1. How can I extract the text to be black and the background to be white, >> before using tesseract? >> 2. Is there a way to configure tesseract to "just work"? >> >> I've been trying to figure out how to do this for some time, and I >> haven't made any progress. >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/0c9cb359-bde4-4c2e-9643-1a9c56b639dc%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLzFxgUkCEG4AnNAsktVwYZn3ROzoyMqmdZbdesZqusoBg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

