I want to extract numbers from an image. Usually the numbers are around some figure and sometimes within the figure. I'm using Tesseract for this task. Tesseract works quite well for documents with a lot of text but I have not really found the right parameters to get good results for this task. I tried different page segmentation modes (PSM_SPARSE_TEXT should in theory work best here), all different engine modes, character whitelist, disabled table detection, disabled dictionary and so on.
Usually the images look like the attached 'NumbersWithFigure'. [image: NumbersWithFigure.jpg] But also using a 'cleaned' image like the attached 'OnlyNumbers' didn't really bring better results. [image: OnlyNumbers.jpg] I'm using Tess4j <https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j/4.5.3> to access Tesseract with Java like this: Tesseract1 tesseract = new Tesseract1(); //default-lang is eng, default OEM is TessOcrEngineMode.OEM_DEFAULT; tesseract.setTessVariable("textord_tabfind_find_tables", "0"); //table detection disabled tesseract.setTessVariable("tessedit_enable_doc_dict", "0"); //don't use dictionary tesseract.setTessVariable("tessedit_char_whitelist", "0123456789"); //only numbers tesseract.setTessVariable("load_system_dawg", "0"); // system dictionary will not be loaded. tesseract.setPageSegMode(TessPageSegMode.PSM_SPARSE_TEXT); tesseract.setDatapath(new File("./tessdata/").getAbsolutePath()); System.out.println("Words: " + tesseract.getWords(entry.getValue(), TessPageIteratorLevel.RIL_WORD)); Any ideas (parameters and/or links to specialized training data)? I've also posted this question on StackOverflow (here <https://stackoverflow.com/questions/64354275/>), but maybe I got more luck here :-) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8b5f2a5b-6ad9-4bc0-9454-33cafc03b88dn%40googlegroups.com.

