Thanks for your response. I already tried your suggestions and I now and then get the desired result. What I'm looking to do now is train tesseract but I don't get tesseract to use my traineddata language. My app is a browser web app that runs on HTTP apache server. I would that you could answer my SO question:
https://stackoverflow.com/questions/57715343/how-do-i-specify-traineddata-language-path-and-language-code-when-using-tesser Thanks On Friday, August 30, 2019, René Hansen <[email protected]> wrote: > A few config params wont do the trick. You need to preprocess the image. Make sure you read this https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality > > Ideally I think you need to cook down the image you give tesseract to something like this: > </mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.2&disp=emb&realattid=ii_jzylkbga1&zw&atsh=1> > > Even this isn't quite good enough though. I get "NG: 1020452" as a result from https://tesseract.projectnaptha.com > > You might need to train on this specific font to get better results, or do further preprocessing to increase accuracy. > > /René > > On Fri, 30 Aug 2019 at 21:19, Clint William Theron < [email protected]> wrote: >> >> Consider the following image and output: >> >> </mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.1&disp=emb&realattid=31454d70-91ee-42dc-b88c-786a6f11d05c&zw&atsh=1> >> >> Tesseract's recognition output: >> LUHO: R54 MILLION GTD >> LOTTO PLUS 1: R6,! MILLION est >> LOTTO PLUS 2: R7,4 MILLION est >> NIN YOUR SHARE OF R1,! MILLION!!! >> Buy any NATIONAL LOTTERY t1cket ther >> SMS :ID,#PLAY,TICKET CODE TO 34909. >> Cash Prizes to be won!!! T’s and C’ >> apply vtsit National Lottery website >> PLEASE RETAIN YOUR ENTRY TICKET! >> First Draw: Saturday 20/07/19 >> VALID RECEIPT FOR 1 Oraw(S) >> FROM DRAW 1937 To 1937 >> LOTTO PLUS 1: ND >> LUTTU PLUS 2: ND >> ‘TotaT:R5.00 >> _‘,{gxt, Inc! 152 VA >> I'm a newbie when it comes to Tesseract.js. I know there is a way to include config parameters to increase the accuracy for OCR. In the above image I'm interested in getting the numbers, between the two horizontal dashed stripes, in the image. Would you give a few config parameters to include in the recognize method to see if it might improve the OCR accuracy. >> Thank you in advance. Ps. Anything would be helpfull >> >> -- >> You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. >> To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2a937bf3-8c97-466d-a9bb-26a277e02522%40googlegroups.com . > > > -- > Never fear, Linux is here. > > -- > You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. > To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAB-60nj7hGExHq8Y8VeXKDODgLBF1EJtCOGikU%2BCK%2B6fAu-uHA%40mail.gmail.com . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAOPMViqDHvcR4Be44sRJL3M9i1DFOkm59C68%3DE7aOjL3sLU9gw%40mail.gmail.com.

