Thank Art for your contribution.
The words that I have to extract from the attached sample are: ost, stain, 
stn, resd, o stn (they occur several times, in total there are 20 words).
I am currently working with OpenCV to preprocess the image and find a raw 
detection of rectangles that contain text. Then I use Tesseract to check 
each rectangle and make ocr. Till now I am able to get 10 of 20 words.

Of course if I already could have bounding boxes for each word, I would 
already solved the problem.


On Saturday, October 14, 2017 at 10:29:29 PM UTC+2, Dmitri Silaev wrote:
>
> What are you unhappy with: detection rate or recognition accuracy? All in 
> all, there's a ton of reasons why Tess can work poorly here. Some kind of 
> preprocessing is definitely needed. What kind? It depends.
>
> I personally would say that I need to know:
> - 5-10 concrete examples of words you are going to look for,
> - their bounding boxes within your sample image.
>
> Once I have it, I might be able to help.
>
> Best regards,
> Dmitri Silaev
> www.CustomOCR.com
>
>
>
>
>
> On Fri, Oct 13, 2017 at 9:05 AM, Paolo Giannoccaro <[email protected] 
> <javascript:>> wrote:
>
>> Hi,
>> I need to detect a fixed set of words in the attached image, not all are 
>> part of canonical english dictionary (for example words could be acronyms).
>>
>> I tried detection on full image or iterating on splitted sub-images, but 
>> quality of detection is low.
>>
>> I use Tess4J and the most important part of my code are:
>>
>> //initialize
>> ITesseract instance = new Tesseract();
>> instance.setTessVariable(VAR_CHAR_WHITELIST, WHITELIST_DEFAULT);
>>
>> //detect
>> int pageIteratorLevel = TessPageIteratorLevel.RIL_WORD;
>> List<Word> result = instance.getWords(image, pageIteratorLevel);
>>
>> Any help ? 
>> Thanks a lot
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/90295194-26a9-4f31-bd9d-63d61d7bd592%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2a4e7de3-3ff3-4085-80f4-6fb2767a6938%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to