do you perhaps have an answer for this one: https://groups.google.com/forum/#!topic/tesseract-ocr/W0e9iusQmi4
On Saturday, August 31, 2019 at 11:18:12 PM UTC+2, Clint William Theron wrote: > > Thanks. I understand. Which tesseract do you have experience with? In > windows 10 I'm able to replace the eng.traineddata file with my own and > then tesseract uses my language. That is what I'm looking for but it has to > be something online (not local). > > On Saturday, August 31, 2019 at 8:17:25 PM UTC+2, René Hansen wrote: >> >> Can't help you there I'm afraid. I have no experience with tesseract.js. >> >> >> /René >> >> >> On Sat, 31 Aug 2019 at 17:28, Clint William Theron < >> [email protected]> wrote: >> >>> Thanks for your response. I already tried your suggestions and I now and >>> then get the desired result. What I'm looking to do now is train tesseract >>> but I don't get tesseract to use my traineddata language. My app is a >>> browser web app that runs on HTTP apache server. I would that you could >>> answer my SO question: >>> >>> >>> https://stackoverflow.com/questions/57715343/how-do-i-specify-traineddata-language-path-and-language-code-when-using-tesser >>> >>> Thanks >>> >>> On Friday, August 30, 2019, René Hansen <[email protected]> wrote: >>> > A few config params wont do the trick. You need to preprocess the >>> image. Make sure you read this >>> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality >>> > >>> > Ideally I think you need to cook down the image you give tesseract to >>> something like this: >>> > >>> </mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.2&disp=emb&realattid=ii_jzylkbga1&zw&atsh=1> >>> > >>> > Even this isn't quite good enough though. I get "NG: 1020452" as a >>> result from https://tesseract.projectnaptha.com >>> > >>> > You might need to train on this specific font to get better results, >>> or do further preprocessing to increase accuracy. >>> > >>> > /René >>> > >>> > On Fri, 30 Aug 2019 at 21:19, Clint William Theron < >>> [email protected]> wrote: >>> >> >>> >> Consider the following image and output: >>> >> >>> >> >>> </mail/u/0/s/?view=att&th=16ce456a472fa41a&attid=0.1&disp=emb&realattid=31454d70-91ee-42dc-b88c-786a6f11d05c&zw&atsh=1> >>> >> >>> >> Tesseract's recognition output: >>> >> LUHO: R54 MILLION GTD >>> >> LOTTO PLUS 1: R6,! MILLION est >>> >> LOTTO PLUS 2: R7,4 MILLION est >>> >> NIN YOUR SHARE OF R1,! MILLION!!! >>> >> Buy any NATIONAL LOTTERY t1cket ther >>> >> SMS :ID,#PLAY,TICKET CODE TO 34909. >>> >> Cash Prizes to be won!!! T’s and C’ >>> >> apply vtsit National Lottery website >>> >> PLEASE RETAIN YOUR ENTRY TICKET! >>> >> First Draw: Saturday 20/07/19 >>> >> VALID RECEIPT FOR 1 Oraw(S) >>> >> FROM DRAW 1937 To 1937 >>> >> LOTTO PLUS 1: ND >>> >> LUTTU PLUS 2: ND >>> >> ‘TotaT:R5.00 >>> >> _‘,{gxt, Inc! 152 VA >>> >> I'm a newbie when it comes to Tesseract.js. I know there is a way to >>> include config parameters to increase the accuracy for OCR. In the above >>> image I'm interested in getting the numbers, between the two horizontal >>> dashed stripes, in the image. Would you give a few config parameters to >>> include in the recognize method to see if it might improve the OCR accuracy. >>> >> Thank you in advance. Ps. Anything would be helpfull >>> >> >>> >> -- >>> >> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> >> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/2a937bf3-8c97-466d-a9bb-26a277e02522%40googlegroups.com >>> . >>> > >>> > >>> > -- >>> > Never fear, Linux is here. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> > To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAB-60nj7hGExHq8Y8VeXKDODgLBF1EJtCOGikU%2BCK%2B6fAu-uHA%40mail.gmail.com >>> . >>> > >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAOPMViqDHvcR4Be44sRJL3M9i1DFOkm59C68%3DE7aOjL3sLU9gw%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAOPMViqDHvcR4Be44sRJL3M9i1DFOkm59C68%3DE7aOjL3sLU9gw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> Never fear, Linux is here. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5a000dd0-4eb0-422b-8559-46bd3aa7a037%40googlegroups.com.

