This is the  ground truth for my image:

4300710413_SampleLog.tif,ost,1158,10307,1247,10353 
4300710413_SampleLog.tif,ost,1161,10389,1244,10435 
4300710413_SampleLog.tif,ost,1158,10515,1237,10560 
4300710413_SampleLog.tif,ost,1329,10554,1418,10599 
4300710413_SampleLog.tif,o stn,1253,10718,1403,10761 
4300710413_SampleLog.tif,stn,1280,10992,1369,11038 
4300710413_SampleLog.tif,ost,1351,11315,1452,11364 
4300710413_SampleLog.tif,ost,1152,11476,1243,11519 
4300710413_SampleLog.tif,ost,1155,11522,1259,11559 
4300710413_SampleLog.tif,ost,1213,11683,1293,11729 
4300710413_SampleLog.tif,ost,1161,12198,1244,12253 
4300710413_SampleLog.tif,ost,1051,12856,1139,12901 
4300710413_SampleLog.tif,ost,1351,13084,1455,13130 
4300710413_SampleLog.tif,ost,1139,13413,1219,13471 
4300710413_SampleLog.tif,ost,1198,13940,1296,13985 
4300710413_SampleLog.tif,ost,1348,16025,1430,16080 
4300710413_SampleLog.tif,ost,1385,16638,1476,16680 
4300710413_SampleLog.tif,ost,1391,16683,1476,16729 
4300710413_SampleLog.tif,ost,1326,17000,1403,17049 
4300710413_SampleLog.tif,stn,1094,17082,1188,17134 
4300710413_SampleLog.tif,ost,1124,17365,1210,17414 
4300710413_SampleLog.tif,ost,1246,17446,1326,17484 
4300710413_SampleLog.tif,ost,1250,17490,1348,17527 
4300710413_SampleLog.tif,st,1018,18130,1071,18165 
4300710413_SampleLog.tif,ost,1227,18848,1309,18885 
4300710413_SampleLog.tif,ost,1337,19121,1413,19172 
4300710413_SampleLog.tif,ost,1137,19894,1213,19942 
4300710413_SampleLog.tif,stn,600,21683,685,21721 
4300710413_SampleLog.tif,stn,844,22080,939,22123 
4300710413_SampleLog.tif,stn,954,22169,1035,22211 
For each word we have top left and bottom right corners coodinates.


On Monday, October 16, 2017 at 8:35:12 PM UTC+2, Dmitri Silaev wrote:
>
> I asked for few bounding boxes to let us all locate the required words 
> inside the image. Depending on what they are, various methods can work or 
> not. Your image is 135 megapixels in size. You should give as much 
> information as possible to make life easier for people who are willing to 
> help, shouldn't you?
>
>
>
> On Mon, Oct 16, 2017 at 2:01 PM, Paolo Giannoccaro <[email protected] 
> <javascript:>> wrote:
>
>> Thank Art for your contribution.
>> The words that I have to extract from the attached sample are: ost, 
>> stain, stn, resd, o stn (they occur several times, in total there are 20 
>> words).
>> I am currently working with OpenCV to preprocess the image and find a raw 
>> detection of rectangles that contain text. Then I use Tesseract to check 
>> each rectangle and make ocr. Till now I am able to get 10 of 20 words.
>>
>> Of course if I already could have bounding boxes for each word, I would 
>> already solved the problem.
>>
>>
>> On Saturday, October 14, 2017 at 10:29:29 PM UTC+2, Dmitri Silaev wrote:
>>>
>>> What are you unhappy with: detection rate or recognition accuracy? All 
>>> in all, there's a ton of reasons why Tess can work poorly here. Some kind 
>>> of preprocessing is definitely needed. What kind? It depends.
>>>
>>> I personally would say that I need to know:
>>> - 5-10 concrete examples of words you are going to look for,
>>> - their bounding boxes within your sample image.
>>>
>>> Once I have it, I might be able to help.
>>>
>>> Best regards,
>>> Dmitri Silaev
>>> www.CustomOCR.com
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <[email protected]
>>> > wrote:
>>>
>>>> Hi,
>>>> I need to detect a fixed set of words in the attached image, not all 
>>>> are part of canonical english dictionary (for example words could be 
>>>> acronyms).
>>>>
>>>> I tried detection on full image or iterating on splitted sub-images, 
>>>> but quality of detection is low.
>>>>
>>>> I use Tess4J and the most important part of my code are:
>>>>
>>>> //initialize
>>>> ITesseract instance = new Tesseract();
>>>> instance.setTessVariable(VAR_CHAR_WHITELIST, WHITELIST_DEFAULT);
>>>>
>>>> //detect
>>>> int pageIteratorLevel = TessPageIteratorLevel.RIL_WORD;
>>>> List<Word> result = instance.getWords(image, pageIteratorLevel);
>>>>
>>>> Any help ? 
>>>> Thanks a lot
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/783a4dd5-84a2-4bec-a333-bcb7959a8a63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to