Thank you Shree, Im left with URW Bookman and Century Schoolbook family (which it seems I have to pay for). For now I'll stick to the linux. Still, thank you very much shree!
I have one more question regarding training: I have German and Englisch PDFs (sometimes mixed). I can use multiple languages (deu+eng). If I finetune for a character, do I have to finetune both language models, eng.lstm + deu.lstm and combine them when using tesseract, like: tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \ --oem 1 \ --psm 3 \ --tessdata-dir ./tesseract/tessdata/best Thank you in advance! Cheers, Dustin Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree: > > > https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria > > > On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com > <javascript:>> wrote: > >> Ok. Thank you very much for your help! I'll get it to work somehow! >> >> Cheers, >> Dustin >> >> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree: >>> >>> Sorry, don't know how to add those fonts for Mac. >>> >>> The tutorial uses the following set of fonts: >>> >>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42 >>> >>> >>> You could use a similar set of fonts available on the Mac and assign via >>> fontlist. >>> >>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com> >>> wrote: >>> >>>> Hey shree, >>>> >>>> do you know how to manually install the missing fonts for MAC, like in >>>> your docu for linux: >>>> >>>> sudo apt update >>>> sudo apt install ttf-mscorefonts-installer >>>> sudo apt install fonts-dejavu >>>> fc-cache -vf >>>> >>>> Thank you in advance! >>>> >>>> Best regards, >>>> Dustin >>>> >>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree: >>>>> >>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I >>>>> only try to create trainingdata for Arial, if use the >>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which >>>>> seems odd. >>>>> >>>>> 2 pages should be ok because it uses the training_text from langdata >>>>> repo which is around 80 lines plus the extra lines added with plusminus. >>>>> >>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com> >>>>> wrote: >>>>> >>>>>> 1. You could install on linux using the appropriate package from >>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata >>>>>> >>>>>> OR >>>>>> >>>>>> 2. When building tesseract from git source, follow >>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools >>>>>> >>>>>> You seem to be missing some steps there. >>>>>> >>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hey Shree, >>>>>>> >>>>>>> Thank you for your help! >>>>>>> >>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I >>>>>>> only try to create trainingdata for Arial, if use the >>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which >>>>>>> seems odd. >>>>>>> >>>>>>> I'm switching to my linux now, but I have problems installing >>>>>>> tesseract. >>>>>>> >>>>>>> I'm following the documentation: >>>>>>> >>>>>>> sudo apt install tesseract-ocr >>>>>>> >>>>>>> After, I try to find the folder to run >>>>>>> >>>>>>> make >>>>>>> make training >>>>>>> make training-install >>>>>>> >>>>>>> But I cannot find the folder (on ubuntu) >>>>>>> >>>>>>> So, I clone the GitHub Repository: >>>>>>> https://github.com/tesseract-ocr/tesseract >>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make training, >>>>>>> sudo make trainng-install >>>>>>> >>>>>>> But then I'll get the following error when running >>>>>>> 5-makedata-plusminus.sh: >>>>>>> >>>>>>> /usr/local/bin/text2image: error while loading shared libraries: >>>>>>> libtesseract.so.5: cannot open shared object file: No such file or >>>>>>> directory >>>>>>> ERROR: Program text2image failed. Abort. >>>>>>> >>>>>>> Thank you very much for your help! >>>>>>> >>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree: >>>>>>>> >>>>>>>> specifically >>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429 >>>>>>>> >>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar <shree...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> See https://github.com/Shreeshrii/tess4training >>>>>>>>> >>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald <d.th...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Changed my evaluation to: >>>>>>>>>> >>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>> --model ~/Desktop/tesstutorial/trainplusminus/ >>>>>>>>>> *plusminus_checkpoint* \ >>>>>>>>>> --traineddata >>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>> --eval_listfile >>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 | >>>>>>>>>> grep ± >>>>>>>>>> >>>>>>>>>> Still doesn't work. >>>>>>>>>> >>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin >>>>>>>>>> Theobald: >>>>>>>>>>> >>>>>>>>>>> Hey guys, >>>>>>>>>>> >>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± approach >>>>>>>>>>> on >>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>>>>>>>>>> ) >>>>>>>>>>> >>>>>>>>>>> (I'm working on a MAC) >>>>>>>>>>> >>>>>>>>>>> My tesseract version: >>>>>>>>>>> >>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74 >>>>>>>>>>> >>>>>>>>>>> leptonica-1.78.0 >>>>>>>>>>> >>>>>>>>>>> libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : >>>>>>>>>>> zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1 >>>>>>>>>>> >>>>>>>>>>> Found AVX2 >>>>>>>>>>> >>>>>>>>>>> Found AVX >>>>>>>>>>> >>>>>>>>>>> Found FMA >>>>>>>>>>> >>>>>>>>>>> Found SSE >>>>>>>>>>> >>>>>>>>>>> Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >>>>>>>>>>> >>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2 >>>>>>>>>>> >>>>>>>>>>> When I evaluate via: >>>>>>>>>>> >>>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>>> --model ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \ >>>>>>>>>>> --traineddata >>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>>> --eval_listfile >>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 | >>>>>>>>>>> grep ± >>>>>>>>>>> >>>>>>>>>>> I don't get any OCR Line correctly. >>>>>>>>>>> >>>>>>>>>>> Does someone see a mistake in my code? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to tesser...@googlegroups.com. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> ____________________________________________________________ >>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesser...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesser...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fe88f4ca-21f8-4b7e-8ae7-fca515fb1dee%40googlegroups.com.