This is the ground truth for my image: 4300710413_SampleLog.tif,ost,1158,10307,1247,10353 4300710413_SampleLog.tif,ost,1161,10389,1244,10435 4300710413_SampleLog.tif,ost,1158,10515,1237,10560 4300710413_SampleLog.tif,ost,1329,10554,1418,10599 4300710413_SampleLog.tif,o stn,1253,10718,1403,10761 4300710413_SampleLog.tif,stn,1280,10992,1369,11038 4300710413_SampleLog.tif,ost,1351,11315,1452,11364 4300710413_SampleLog.tif,ost,1152,11476,1243,11519 4300710413_SampleLog.tif,ost,1155,11522,1259,11559 4300710413_SampleLog.tif,ost,1213,11683,1293,11729 4300710413_SampleLog.tif,ost,1161,12198,1244,12253 4300710413_SampleLog.tif,ost,1051,12856,1139,12901 4300710413_SampleLog.tif,ost,1351,13084,1455,13130 4300710413_SampleLog.tif,ost,1139,13413,1219,13471 4300710413_SampleLog.tif,ost,1198,13940,1296,13985 4300710413_SampleLog.tif,ost,1348,16025,1430,16080 4300710413_SampleLog.tif,ost,1385,16638,1476,16680 4300710413_SampleLog.tif,ost,1391,16683,1476,16729 4300710413_SampleLog.tif,ost,1326,17000,1403,17049 4300710413_SampleLog.tif,stn,1094,17082,1188,17134 4300710413_SampleLog.tif,ost,1124,17365,1210,17414 4300710413_SampleLog.tif,ost,1246,17446,1326,17484 4300710413_SampleLog.tif,ost,1250,17490,1348,17527 4300710413_SampleLog.tif,st,1018,18130,1071,18165 4300710413_SampleLog.tif,ost,1227,18848,1309,18885 4300710413_SampleLog.tif,ost,1337,19121,1413,19172 4300710413_SampleLog.tif,ost,1137,19894,1213,19942 4300710413_SampleLog.tif,stn,600,21683,685,21721 4300710413_SampleLog.tif,stn,844,22080,939,22123 4300710413_SampleLog.tif,stn,954,22169,1035,22211 For each word we have top left and bottom right corners coodinates.
On Monday, October 16, 2017 at 8:35:12 PM UTC+2, Dmitri Silaev wrote: > > I asked for few bounding boxes to let us all locate the required words > inside the image. Depending on what they are, various methods can work or > not. Your image is 135 megapixels in size. You should give as much > information as possible to make life easier for people who are willing to > help, shouldn't you? > > > > On Mon, Oct 16, 2017 at 2:01 PM, Paolo Giannoccaro <[email protected] > <javascript:>> wrote: > >> Thank Art for your contribution. >> The words that I have to extract from the attached sample are: ost, >> stain, stn, resd, o stn (they occur several times, in total there are 20 >> words). >> I am currently working with OpenCV to preprocess the image and find a raw >> detection of rectangles that contain text. Then I use Tesseract to check >> each rectangle and make ocr. Till now I am able to get 10 of 20 words. >> >> Of course if I already could have bounding boxes for each word, I would >> already solved the problem. >> >> >> On Saturday, October 14, 2017 at 10:29:29 PM UTC+2, Dmitri Silaev wrote: >>> >>> What are you unhappy with: detection rate or recognition accuracy? All >>> in all, there's a ton of reasons why Tess can work poorly here. Some kind >>> of preprocessing is definitely needed. What kind? It depends. >>> >>> I personally would say that I need to know: >>> - 5-10 concrete examples of words you are going to look for, >>> - their bounding boxes within your sample image. >>> >>> Once I have it, I might be able to help. >>> >>> Best regards, >>> Dmitri Silaev >>> www.CustomOCR.com >>> >>> >>> >>> >>> >>> On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <[email protected] >>> > wrote: >>> >>>> Hi, >>>> I need to detect a fixed set of words in the attached image, not all >>>> are part of canonical english dictionary (for example words could be >>>> acronyms). >>>> >>>> I tried detection on full image or iterating on splitted sub-images, >>>> but quality of detection is low. >>>> >>>> I use Tess4J and the most important part of my code are: >>>> >>>> //initialize >>>> ITesseract instance = new Tesseract(); >>>> instance.setTessVariable(VAR_CHAR_WHITELIST, WHITELIST_DEFAULT); >>>> >>>> //detect >>>> int pageIteratorLevel = TessPageIteratorLevel.RIL_WORD; >>>> List<Word> result = instance.getWords(image, pageIteratorLevel); >>>> >>>> Any help ? >>>> Thanks a lot >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/783a4dd5-84a2-4bec-a333-bcb7959a8a63%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

