No worries, I will play around and see what I can get working. For now I am using a simple replace in my script to handle the Æ. How would I go about if I were to compile tesseract 4.0 alpha using git and cmake? The wiki says the 4.0 alpha Source code is available in the master branch of the repository, but I have yet to find it...The compiling part seems straght-forward enough, but I need the source ;).
Tried installing the gimagereader hoping that it would give me the dll for tesseract 4.0, but no. mandag 9. januar 2017 08.34.18 UTC+1 skrev shree følgende: > Sorry, I am not familiar with powershell and nuget. > > If you are on Windows, you can try the experimental binaries for > 4.0.0alpha for gimagereader, gui front-end to Tesseract-ocr. You can ocr a > pdf directly or load multiple images at the same time. > > - excuse the brevity, sent from mobile > > On 09-Jan-2017 12:49 PM, "Ludvig F Aarstad" <lud...@aarstad.org > <javascript:>> wrote: > >> Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I >> am basing my code on this: >> https://github.com/jourdant/powershell-paperless and there is a script >> to initialize the environment that is getting the tesseract files from >> here: https://nuget.org/api/v2/package/tesseract-ocr. Would you be able >> to point me in the right direction on how to move this from 3.03 to the >> 4.0alpha? >> >> >> >> fredag 6. januar 2017 13.50.38 UTC+1 skrev shree følgende: >> >>> I have uploaded modified nor.traineddata at >>> >>> https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata >>> >>> See attached log and info file for commands used in training. It took >>> about 9 hours on my pc - about 1700 iterations only and then my PC froze so >>> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. >>> 0.853 % character error rate at iteration number 1615. >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shree...@gmail.com> >>> wrote: >>> >>>> @Peter, Have you tried the 4.0.0alpha version yet? >>>> >>>> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I >>>> will upload the new traineddata so that you can test. You will need >>>> 4.0.alpha version for testing. >>>> >>>> Here is couple of the training tifs and OCRed text. >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote: >>>> >>>>> >>>>> >>>>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree: >>>>>> >>>>>> Ray is planning to retrain the languages for the new 4.0.0 version >>>>>> sometime in January. So it would be helpful if you could open an issue >>>>>> on >>>>>> https://github.com/tesseract-ocr/langdata/issues with this >>>>>> information. >>>>>> >>>>> >>>>> Is it possible to contribute training data for this effort? I realise >>>>> swedish will not be on top of the list but I think it would be easy to >>>>> involve some of the research community here in contributing training data >>>>> if it could improve the language model. >>>>> >>>>> /Peter >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b193b0be-f57d-44cf-b2e4-6efc5bb9a361%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.