Sorry, I am not familiar with powershell and nuget. If you are on Windows, you can try the experimental binaries for 4.0.0alpha for gimagereader, gui front-end to Tesseract-ocr. You can ocr a pdf directly or load multiple images at the same time.
- excuse the brevity, sent from mobile On 09-Jan-2017 12:49 PM, "Ludvig F Aarstad" <lud...@aarstad.org> wrote: > Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I am > basing my code on this: https://github.com/jourdant/powershell-paperless > and there is a script to initialize the environment that is getting the > tesseract files from here: https://nuget.org/api/v2/package/tesseract-ocr. > Would you be able to point me in the right direction on how to move this > from 3.03 to the 4.0alpha? > > > > fredag 6. januar 2017 13.50.38 UTC+1 skrev shree følgende: > >> I have uploaded modified nor.traineddata at >> >> https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata >> >> See attached log and info file for commands used in training. It took >> about 9 hours on my pc - about 1700 iterations only and then my PC froze so >> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. >> 0.853 % character error rate at iteration number 1615. >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shree...@gmail.com> >> wrote: >> >>> @Peter, Have you tried the 4.0.0alpha version yet? >>> >>> @Ludvig F. Aarstad - Add a layer training worked for adding 'Æ' - I >>> will upload the new traineddata so that you can test. You will need >>> 4.0.alpha version for testing. >>> >>> Here is couple of the training tifs and OCRed text. >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Fri, Jan 6, 2017 at 5:01 PM, Peter <pe...@peterkrantz.se> wrote: >>> >>>> >>>> >>>> Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree: >>>>> >>>>> Ray is planning to retrain the languages for the new 4.0.0 version >>>>> sometime in January. So it would be helpful if you could open an issue on >>>>> https://github.com/tesseract-ocr/langdata/issues with this >>>>> information. >>>>> >>>> >>>> Is it possible to contribute training data for this effort? I realise >>>> swedish will not be on top of the list but I think it would be easy to >>>> involve some of the research community here in contributing training data >>>> if it could improve the language model. >>>> >>>> /Peter >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit https://groups.google.com/d/ms >>>> gid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40goo >>>> glegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/9788db26-bb8a-4861-b29e-80db2b5a687f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f2ddc038-3409-44e6-8b00-2354a95d3ba6%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXUj4Nf5wcpJfHPnrCt3Ds1BbVD3KcMPEUYqQdnORiPHQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.