I asked for few bounding boxes to let us all locate the required words inside the image. Depending on what they are, various methods can work or not. Your image is 135 megapixels in size. You should give as much information as possible to make life easier for people who are willing to help, shouldn't you?
On Mon, Oct 16, 2017 at 2:01 PM, Paolo Giannoccaro <[email protected] > wrote: > Thank Art for your contribution. > The words that I have to extract from the attached sample are: ost, stain, > stn, resd, o stn (they occur several times, in total there are 20 words). > I am currently working with OpenCV to preprocess the image and find a raw > detection of rectangles that contain text. Then I use Tesseract to check > each rectangle and make ocr. Till now I am able to get 10 of 20 words. > > Of course if I already could have bounding boxes for each word, I would > already solved the problem. > > > On Saturday, October 14, 2017 at 10:29:29 PM UTC+2, Dmitri Silaev wrote: >> >> What are you unhappy with: detection rate or recognition accuracy? All in >> all, there's a ton of reasons why Tess can work poorly here. Some kind of >> preprocessing is definitely needed. What kind? It depends. >> >> I personally would say that I need to know: >> - 5-10 concrete examples of words you are going to look for, >> - their bounding boxes within your sample image. >> >> Once I have it, I might be able to help. >> >> Best regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> >> >> On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <[email protected]> >> wrote: >> >>> Hi, >>> I need to detect a fixed set of words in the attached image, not all are >>> part of canonical english dictionary (for example words could be acronyms). >>> >>> I tried detection on full image or iterating on splitted sub-images, but >>> quality of detection is low. >>> >>> I use Tess4J and the most important part of my code are: >>> >>> //initialize >>> ITesseract instance = new Tesseract(); >>> instance.setTessVariable(VAR_CHAR_WHITELIST, WHITELIST_DEFAULT); >>> >>> //detect >>> int pageIteratorLevel = TessPageIteratorLevel.RIL_WORD; >>> List<Word> result = instance.getWords(image, pageIteratorLevel); >>> >>> Any help ? >>> Thanks a lot >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAKzLxFPfXA%2BRZ4_K%2BLstXD3rYAx3N_eR491vw0bkMLFFJcAh3g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

