[tesseract-ocr] Config hints to improve recognition accuracy.

Clint William Theron Sat, 31 Aug 2019 08:29:15 -0700

Thanks for your response. I already tried your suggestions and I now and
then get the desired result. What I'm looking to do now is train tesseract
but I don't get tesseract to use my traineddata language. My app is a
browser web app that runs on HTTP apache server. I would that you could
answer my SO question:


https://stackoverflow.com/questions/57715343/how-do-i-specify-traineddata-language-path-and-language-code-when-using-tesser

Thanks

On Friday, August 30, 2019, René Hansen <[email protected]> wrote:
> A few config params wont do the trick. You need to preprocess the image.
Make sure you read this
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
>
> Ideally I think you need to cook down the image you give tesseract to
something like this:
>
</mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.2&disp=emb&realattid=ii_jzylkbga1&zw&atsh=1>
>
> Even this isn't quite good enough though. I get "NG: 1020452" as a result
from https://tesseract.projectnaptha.com
>
> You might need to train on this specific font to get better results, or
do further preprocessing to increase accuracy.
>
> /René
>
> On Fri, 30 Aug 2019 at 21:19, Clint William Theron <
[email protected]> wrote:
>>
>> Consider the following image and output:
>>
>>
</mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.1&disp=emb&realattid=31454d70-91ee-42dc-b88c-786a6f11d05c&zw&atsh=1>
>>
>> Tesseract's recognition output:
>> LUHO: R54 MILLION GTD
>> LOTTO PLUS 1: R6,! MILLION est
>> LOTTO PLUS 2: R7,4 MILLION est
>> NIN YOUR SHARE OF R1,! MILLION!!!
>> Buy any NATIONAL LOTTERY t1cket ther
>> SMS :ID,#PLAY,TICKET CODE TO 34909.
>> Cash Prizes to be won!!! T’s and C’
>> apply vtsit National Lottery website
>> PLEASE RETAIN YOUR ENTRY TICKET!
>> First Draw: Saturday 20/07/19
>> VALID RECEIPT FOR 1 Oraw(S)
>> FROM DRAW 1937 To 1937
>> LOTTO PLUS 1: ND
>> LUTTU PLUS 2: ND
>> ‘TotaT:R5.00
>> _‘,{gxt, Inc! 152 VA
>> I'm a newbie when it comes to Tesseract.js. I know there is a way to
include config parameters to increase the accuracy for OCR. In the above
image I'm interested in getting the numbers, between the two horizontal
dashed stripes, in the image. Would you give a few config parameters to
include in the recognize method to see if it might improve the OCR accuracy.
>> Thank you in advance.  Ps. Anything would be helpfull
>>
>> --
>> You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
>> To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/2a937bf3-8c97-466d-a9bb-26a277e02522%40googlegroups.com
.
>
>
> --
> Never fear, Linux is here.
>
> --
> You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
> To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAB-60nj7hGExHq8Y8VeXKDODgLBF1EJtCOGikU%2BCK%2B6fAu-uHA%40mail.gmail.com
.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAOPMViqDHvcR4Be44sRJL3M9i1DFOkm59C68%3DE7aOjL3sLU9gw%40mail.gmail.com.

[tesseract-ocr] Config hints to improve recognition accuracy.

Reply via email to