Can all used fonts render Ø? On Thu, Oct 3, 2019 at 7:59 PM Dustin Theobald <d.theo1...@gmail.com> wrote:
> I also tried to change the training-text with respect to Ø: > > cat <<EOM >>../langdata/eng/eng.plusminus.training_text > alkoxy of LEAVES Ø1.84% by Buying curved RESISTANCE MARKED Your (Vol. > SPANIEL > TRAVELED Ø85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom > Leadership > Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's Ø1.31 POPSET > Os—C(11) > VOLVO abdomen, Ø65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri > PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, Ø14¢ Flujo Gilbert > Dresdner Yesterday's Dilated SYSTEMS Your FOUR Ø90° Gogol PARTIALLY BOARDS > firm > Email ACTUAL QUEENSLAND Carl's Unruly Ø8.4 DESTRUCTION customers DataVac® > DAY > Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY Ø2.96% Ask! WELL > Lambert own Company View mg \ (Ø7) SENSOR STUDYING Feb EVENTUALLY [It > Yahoo! Tv > United by #DEFINE Rebel PERFORMED Ø500Gb Oliver Forums Many | ©2003-2008 > Used OF > Avoidance Moosejaw pm* Ø18 note: PROBE Jailbroken RAISE Fountains Write > Goods (Ø6) > Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § Ø44.01189673355 € > netting Bookmark of WE MORE) STRENGTH IDENTICAL Ø2? activity PROPERTY > MAINTAINED > EOM > > The evaluation on the training data works, but he doesn't recognize any > Line in the evalplusminus/eng.training_files.txt > > Am Donnerstag, 3. Oktober 2019 13:59:19 UTC+2 schrieb Dustin Theobald: >> >> Thank you Shree, >> >> Im left with URW Bookman and Century Schoolbook family (which it seems I >> have to pay for). >> For now I'll stick to the linux. Still, thank you very much shree! >> >> I have one more question regarding training: >> >> I have German and Englisch PDFs (sometimes mixed). I can use multiple >> languages (deu+eng). If I finetune for a character, do I have to finetune >> both language models, eng.lstm + deu.lstm and combine them when using >> tesseract, like: >> >> tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \ >> --oem 1 \ >> --psm 3 \ >> --tessdata-dir ./tesseract/tessdata/best >> >> Thank you in advance! >> >> Cheers, >> Dustin >> >> Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree: >>> >>> >>> https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria >>> >>> >>> On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com> >>> wrote: >>> >>>> Ok. Thank you very much for your help! I'll get it to work somehow! >>>> >>>> Cheers, >>>> Dustin >>>> >>>> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree: >>>>> >>>>> Sorry, don't know how to add those fonts for Mac. >>>>> >>>>> The tutorial uses the following set of fonts: >>>>> >>>>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42 >>>>> >>>>> >>>>> You could use a similar set of fonts available on the Mac and assign >>>>> via fontlist. >>>>> >>>>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey shree, >>>>>> >>>>>> do you know how to manually install the missing fonts for MAC, like >>>>>> in your docu for linux: >>>>>> >>>>>> sudo apt update >>>>>> sudo apt install ttf-mscorefonts-installer >>>>>> sudo apt install fonts-dejavu >>>>>> fc-cache -vf >>>>>> >>>>>> Thank you in advance! >>>>>> >>>>>> Best regards, >>>>>> Dustin >>>>>> >>>>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree: >>>>>>> >>>>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I >>>>>>> only try to create trainingdata for Arial, if use the >>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which >>>>>>> seems odd. >>>>>>> >>>>>>> 2 pages should be ok because it uses the training_text from langdata >>>>>>> repo which is around 80 lines plus the extra lines added with plusminus. >>>>>>> >>>>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> 1. You could install on linux using the appropriate package from >>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata >>>>>>>> >>>>>>>> OR >>>>>>>> >>>>>>>> 2. When building tesseract from git source, follow >>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools >>>>>>>> >>>>>>>> You seem to be missing some steps there. >>>>>>>> >>>>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey Shree, >>>>>>>>> >>>>>>>>> Thank you for your help! >>>>>>>>> >>>>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I >>>>>>>>> only try to create trainingdata for Arial, if use the >>>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), >>>>>>>>> which >>>>>>>>> seems odd. >>>>>>>>> >>>>>>>>> I'm switching to my linux now, but I have problems installing >>>>>>>>> tesseract. >>>>>>>>> >>>>>>>>> I'm following the documentation: >>>>>>>>> >>>>>>>>> sudo apt install tesseract-ocr >>>>>>>>> >>>>>>>>> After, I try to find the folder to run >>>>>>>>> >>>>>>>>> make >>>>>>>>> make training >>>>>>>>> make training-install >>>>>>>>> >>>>>>>>> But I cannot find the folder (on ubuntu) >>>>>>>>> >>>>>>>>> So, I clone the GitHub Repository: >>>>>>>>> https://github.com/tesseract-ocr/tesseract >>>>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make >>>>>>>>> training, sudo make trainng-install >>>>>>>>> >>>>>>>>> But then I'll get the following error when running >>>>>>>>> 5-makedata-plusminus.sh: >>>>>>>>> >>>>>>>>> /usr/local/bin/text2image: error while loading shared libraries: >>>>>>>>> libtesseract.so.5: cannot open shared object file: No such file or >>>>>>>>> directory >>>>>>>>> ERROR: Program text2image failed. Abort. >>>>>>>>> >>>>>>>>> Thank you very much for your help! >>>>>>>>> >>>>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree: >>>>>>>>>> >>>>>>>>>> specifically >>>>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429 >>>>>>>>>> >>>>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar < >>>>>>>>>> shree...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> See https://github.com/Shreeshrii/tess4training >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald < >>>>>>>>>>> d.th...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Changed my evaluation to: >>>>>>>>>>>> >>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>>>> --model ~/Desktop/tesstutorial/trainplusminus/ >>>>>>>>>>>> *plusminus_checkpoint* \ >>>>>>>>>>>> --traineddata >>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>>>> --eval_listfile >>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 >>>>>>>>>>>> | grep ± >>>>>>>>>>>> >>>>>>>>>>>> Still doesn't work. >>>>>>>>>>>> >>>>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin >>>>>>>>>>>> Theobald: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± approach >>>>>>>>>>>>> on >>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>>>>>>>>>>>> ) >>>>>>>>>>>>> >>>>>>>>>>>>> (I'm working on a MAC) >>>>>>>>>>>>> >>>>>>>>>>>>> My tesseract version: >>>>>>>>>>>>> >>>>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74 >>>>>>>>>>>>> >>>>>>>>>>>>> leptonica-1.78.0 >>>>>>>>>>>>> >>>>>>>>>>>>> libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : >>>>>>>>>>>>> zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1 >>>>>>>>>>>>> >>>>>>>>>>>>> Found AVX2 >>>>>>>>>>>>> >>>>>>>>>>>>> Found AVX >>>>>>>>>>>>> >>>>>>>>>>>>> Found FMA >>>>>>>>>>>>> >>>>>>>>>>>>> Found SSE >>>>>>>>>>>>> >>>>>>>>>>>>> Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >>>>>>>>>>>>> >>>>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2 >>>>>>>>>>>>> >>>>>>>>>>>>> When I evaluate via: >>>>>>>>>>>>> >>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>>>>> --model >>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \ >>>>>>>>>>>>> --traineddata >>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>>>>> --eval_listfile >>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 >>>>>>>>>>>>> | grep ± >>>>>>>>>>>>> >>>>>>>>>>>>> I don't get any OCR Line correctly. >>>>>>>>>>>>> >>>>>>>>>>>>> Does someone see a mistake in my code? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>> it, send an email to tesser...@googlegroups.com. >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com >>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> ____________________________________________________________ >>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesser...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ____________________________________________________________ >>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesser...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesser...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWTX9is5_smoUsnd57woCLcqO0h1U%2Bi741t97JSH%3Di30Q%40mail.gmail.com.