Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-08 Thread Ludvig F Aarstad
No worries, I will play around and see what I can get working. For now I am using a simple replace in my script to handle the Æ. How would I go about if I were to compile tesseract 4.0 alpha using git and cmake? The wiki says the 4.0 alpha Source code is available in the master branch of the

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-08 Thread ShreeDevi Kumar
Sorry, I am not familiar with powershell and nuget. If you are on Windows, you can try the experimental binaries for 4.0.0alpha for gimagereader, gui front-end to Tesseract-ocr. You can ocr a pdf directly or load multiple images at the same time. - excuse the brevity, sent from mobile On

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-08 Thread Ludvig F Aarstad
Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I am basing my code on this: https://github.com/jourdant/powershell-paperless and there is a script to initialize the environment that is getting the tesseract files from here: https://nuget.org/api/v2/package/tesseract-ocr.

[tesseract-ocr] Re: Swedish language

2017-01-08 Thread ShreeDevi Kumar
Testing with tifs created from the training text, accuracy seems quite good for Swedish using 4.0.0-alpha traineddata. Please see attached eval reports. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jan 6, 2017