What are you unhappy with: detection rate or recognition accuracy? All in all, there's a ton of reasons why Tess can work poorly here. Some kind of preprocessing is definitely needed. What kind? It depends.
I personally would say that I need to know: - 5-10 concrete examples of words you are going to look for, - their bounding boxes within your sample image. Once I have it, I might be able to help. Best regards, Dmitri Silaev www.CustomOCR.com On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <[email protected] > wrote: > Hi, > I need to detect a fixed set of words in the attached image, not all are > part of canonical english dictionary (for example words could be acronyms). > > I tried detection on full image or iterating on splitted sub-images, but > quality of detection is low. > > I use Tess4J and the most important part of my code are: > > //initialize > ITesseract instance = new Tesseract(); > instance.setTessVariable(VAR_CHAR_WHITELIST, WHITELIST_DEFAULT); > > //detect > int pageIteratorLevel = TessPageIteratorLevel.RIL_WORD; > List<Word> result = instance.getWords(image, pageIteratorLevel); > > Any help ? > Thanks a lot > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFMS_iy%2BQ-aRH7R%3DXLruaoivK5HPw%2BSjv9dcub5f65S%3DWQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

