Re: [tesseract-ocr] Guidance for not recognized text

2020-10-01 Thread Jean-Marc Spaggiari
I was curious as why it works super well for some white and black, and not at all for others. I will try the invertion. Thanks, JMS Le jeudi 1 octobre 2020 à 12 h 59 min 09 s UTC-4, Lorenzo Blz a écrit : > Invert the image. > > > > Il gio 1 ott 2020, 14:58 Jean-Marc Spaggiari ha > scritto: >

Re: [tesseract-ocr] Re: OCR fails on a preprocessed visually good looking image

2020-10-01 Thread Ger Hobbelt
Hi, AFAICT tesseract OCR quality deteriorates a lot when being fed 'inverted colors', i.e. white text on black background. (Can't dig up the tesseract blog / article I first saw this mentioned and google fails me in this regard right this minute, sorry.) Second, from what I gather from all the ap

Re: [tesseract-ocr] Guidance for not recognized text

2020-10-01 Thread Lorenzo Bolzani
Invert the image. Il gio 1 ott 2020, 14:58 Jean-Marc Spaggiari ha scritto: > Hi, > > I'm playing around with Tesseract to try to do some OCR on screen captures. > > My picture looks like this: > [image: name.png] > > But is recognized like this: > Eglise Chrétienne Evangélique de > sy oan 8)=1

[tesseract-ocr] Re: OCR fails on a preprocessed visually good looking image

2020-10-01 Thread Jean-Marc Spaggiari
Hi Fabian, Are you able to try by removing the camera picture on the left? Or it has to stay there? Maybe you can split your picture into smaller one, by looking for vertical delimiters? JM Le mercredi 30 septembre 2020 à 06 h 50 min 44 s UTC-4, fabian...@googlemail.com a écrit : > Hello, >

[tesseract-ocr] Guidance for not recognized text

2020-10-01 Thread Jean-Marc Spaggiari
Hi, I'm playing around with Tesseract to try to do some OCR on screen captures. My picture looks like this: [image: name.png] But is recognized like this: Eglise Chrétienne Evangélique de sy oan 8)=1= Place Je Me Souviens, Laval, QC H7L 1T9, ‘Tate lale| Long lines are fine, but short are defin

Re: [tesseract-ocr] OMP_THREAD_LIMIT=1 gives improvement in 4.1 version

2020-10-01 Thread shree
Related discussion at https://github.com/tesseract-ocr/tesseract/issues/3109 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegro

[tesseract-ocr] Math formulas in tesseract intractable? Why? ...

2020-10-01 Thread Albretch Mueller
My search on such a topic: https://groups.google.com/g/tesseract-ocr/search?q=%22Math%20formulas%22 gave me only four unhopeful hits: // __ Mathematical Formulae recognition https://groups.google.com/g/tesseract-ocr/c/gh-bficm_2w/m/8xw4F3_sAQAJ ~ // __ Handwritten math formula(symbol) reco

Re: [tesseract-ocr] Tesseract 4.1 disabled openmp support by default - What does it mean?

2020-10-01 Thread Amit Shah
Well technically it seems that is what they try to mean - they removing the openmp that was used cuz anyways it was not giving any good results as such. On Wed, Sep 30, 2020 at 4:07 PM Sarath C P wrote: > Hi, we were using Tesseract4.0 in my previous problem. It was giving a > good performance w

Re: [tesseract-ocr] Diacriticals Training

2020-10-01 Thread Shree Devi Kumar
Please read tesseract documentation regarding lstm training by replacing a layer. On Thu, Oct 1, 2020, 11:29 shreyansh dwivedi wrote: > Hello Shree, > Firstly, thank you for looking into it. Secondly, I would be grateful if > you share the piece of code with the explanation part of how to train

[tesseract-ocr] Optimal text for OCR long strings of characters

2020-10-01 Thread nickname changed with proper one
1) In which fonts made for machine recognition is the tesseract trained by default? 2) If it is not trained by default in any font made for machine recognition please point me to training data for such font if someone has done that. 3) Help me to configure tesseract not to use an dictionaries, ju

Re: [tesseract-ocr] OMP_THREAD_LIMIT=1 gives improvement in 4.1 version

2020-10-01 Thread Sarath C P
Also we wanted without OMP_THREAD_LIMIT=1 is tesseract-4.1 is running multi threading or not? On Thu, Oct 1, 2020 at 11:23 AM Sarath C P wrote: > Hi, >> Please see steps followed. >> >> OS: LINUX, 4 CPUS , 2 CORES > >> Version tesseract - 4.1 >> >> 1. Our python web application running in n