Ok, when I run make_training_data, it says "Other case ø of Ø is not in unicharset", might this be a problem? Even though Ø is in the unicharset?
Cheers, Dustin Am Donnerstag, 3. Oktober 2019 16:52:46 UTC+2 schrieb shree: > > Can all used fonts render Ø? > > On Thu, Oct 3, 2019 at 7:59 PM Dustin Theobald <d.th...@gmail.com > <javascript:>> wrote: > >> I also tried to change the training-text with respect to Ø: >> >> cat <<EOM >>../langdata/eng/eng.plusminus.training_text >> alkoxy of LEAVES Ø1.84% by Buying curved RESISTANCE MARKED Your (Vol. >> SPANIEL >> TRAVELED Ø85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom >> Leadership >> Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's Ø1.31 POPSET >> Os—C(11) >> VOLVO abdomen, Ø65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri >> PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, Ø14¢ Flujo Gilbert >> Dresdner Yesterday's Dilated SYSTEMS Your FOUR Ø90° Gogol PARTIALLY >> BOARDS firm >> Email ACTUAL QUEENSLAND Carl's Unruly Ø8.4 DESTRUCTION customers DataVac® >> DAY >> Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY Ø2.96% Ask! WELL >> Lambert own Company View mg \ (Ø7) SENSOR STUDYING Feb EVENTUALLY [It >> Yahoo! Tv >> United by #DEFINE Rebel PERFORMED Ø500Gb Oliver Forums Many | ©2003-2008 >> Used OF >> Avoidance Moosejaw pm* Ø18 note: PROBE Jailbroken RAISE Fountains Write >> Goods (Ø6) >> Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § Ø44.01189673355 € >> netting Bookmark of WE MORE) STRENGTH IDENTICAL Ø2? activity PROPERTY >> MAINTAINED >> EOM >> >> The evaluation on the training data works, but he doesn't recognize any >> Line in the evalplusminus/eng.training_files.txt >> >> Am Donnerstag, 3. Oktober 2019 13:59:19 UTC+2 schrieb Dustin Theobald: >>> >>> Thank you Shree, >>> >>> Im left with URW Bookman and Century Schoolbook family (which it seems I >>> have to pay for). >>> For now I'll stick to the linux. Still, thank you very much shree! >>> >>> I have one more question regarding training: >>> >>> I have German and Englisch PDFs (sometimes mixed). I can use multiple >>> languages (deu+eng). If I finetune for a character, do I have to finetune >>> both language models, eng.lstm + deu.lstm and combine them when using >>> tesseract, like: >>> >>> tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \ >>> --oem 1 \ >>> --psm 3 \ >>> --tessdata-dir ./tesseract/tessdata/best >>> >>> Thank you in advance! >>> >>> Cheers, >>> Dustin >>> >>> Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree: >>>> >>>> >>>> https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria >>>> >>>> >>>> On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com> >>>> wrote: >>>> >>>>> Ok. Thank you very much for your help! I'll get it to work somehow! >>>>> >>>>> Cheers, >>>>> Dustin >>>>> >>>>> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree: >>>>>> >>>>>> Sorry, don't know how to add those fonts for Mac. >>>>>> >>>>>> The tutorial uses the following set of fonts: >>>>>> >>>>>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42 >>>>>> >>>>>> >>>>>> You could use a similar set of fonts available on the Mac and assign >>>>>> via fontlist. >>>>>> >>>>>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hey shree, >>>>>>> >>>>>>> do you know how to manually install the missing fonts for MAC, like >>>>>>> in your docu for linux: >>>>>>> >>>>>>> sudo apt update >>>>>>> sudo apt install ttf-mscorefonts-installer >>>>>>> sudo apt install fonts-dejavu >>>>>>> fc-cache -vf >>>>>>> >>>>>>> Thank you in advance! >>>>>>> >>>>>>> Best regards, >>>>>>> Dustin >>>>>>> >>>>>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree: >>>>>>>> >>>>>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I >>>>>>>> only try to create trainingdata for Arial, if use the >>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), >>>>>>>> which >>>>>>>> seems odd. >>>>>>>> >>>>>>>> 2 pages should be ok because it uses the training_text from >>>>>>>> langdata repo which is around 80 lines plus the extra lines added with >>>>>>>> plusminus. >>>>>>>> >>>>>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> 1. You could install on linux using the appropriate package from >>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata >>>>>>>>> >>>>>>>>> OR >>>>>>>>> >>>>>>>>> 2. When building tesseract from git source, follow >>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools >>>>>>>>> >>>>>>>>> You seem to be missing some steps there. >>>>>>>>> >>>>>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hey Shree, >>>>>>>>>> >>>>>>>>>> Thank you for your help! >>>>>>>>>> >>>>>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I >>>>>>>>>> only try to create trainingdata for Arial, if use the >>>>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), >>>>>>>>>> which >>>>>>>>>> seems odd. >>>>>>>>>> >>>>>>>>>> I'm switching to my linux now, but I have problems installing >>>>>>>>>> tesseract. >>>>>>>>>> >>>>>>>>>> I'm following the documentation: >>>>>>>>>> >>>>>>>>>> sudo apt install tesseract-ocr >>>>>>>>>> >>>>>>>>>> After, I try to find the folder to run >>>>>>>>>> >>>>>>>>>> make >>>>>>>>>> make training >>>>>>>>>> make training-install >>>>>>>>>> >>>>>>>>>> But I cannot find the folder (on ubuntu) >>>>>>>>>> >>>>>>>>>> So, I clone the GitHub Repository: >>>>>>>>>> https://github.com/tesseract-ocr/tesseract >>>>>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make >>>>>>>>>> training, sudo make trainng-install >>>>>>>>>> >>>>>>>>>> But then I'll get the following error when running >>>>>>>>>> 5-makedata-plusminus.sh: >>>>>>>>>> >>>>>>>>>> /usr/local/bin/text2image: error while loading shared libraries: >>>>>>>>>> libtesseract.so.5: cannot open shared object file: No such file or >>>>>>>>>> directory >>>>>>>>>> ERROR: Program text2image failed. Abort. >>>>>>>>>> >>>>>>>>>> Thank you very much for your help! >>>>>>>>>> >>>>>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree: >>>>>>>>>>> >>>>>>>>>>> specifically >>>>>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429 >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar < >>>>>>>>>>> shree...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> See https://github.com/Shreeshrii/tess4training >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald < >>>>>>>>>>>> d.th...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Changed my evaluation to: >>>>>>>>>>>>> >>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>>>>> --model ~/Desktop/tesstutorial/trainplusminus/ >>>>>>>>>>>>> *plusminus_checkpoint* \ >>>>>>>>>>>>> --traineddata >>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>>>>> --eval_listfile >>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 >>>>>>>>>>>>> | grep ± >>>>>>>>>>>>> >>>>>>>>>>>>> Still doesn't work. >>>>>>>>>>>>> >>>>>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin >>>>>>>>>>>>> Theobald: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey guys, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± >>>>>>>>>>>>>> approach >>>>>>>>>>>>>> on >>>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>>>>>>>>>>>>> ) >>>>>>>>>>>>>> >>>>>>>>>>>>>> (I'm working on a MAC) >>>>>>>>>>>>>> >>>>>>>>>>>>>> My tesseract version: >>>>>>>>>>>>>> >>>>>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74 >>>>>>>>>>>>>> >>>>>>>>>>>>>> leptonica-1.78.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 >>>>>>>>>>>>>> : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Found AVX2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Found AVX >>>>>>>>>>>>>> >>>>>>>>>>>>>> Found FMA >>>>>>>>>>>>>> >>>>>>>>>>>>>> Found SSE >>>>>>>>>>>>>> >>>>>>>>>>>>>> Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 >>>>>>>>>>>>>> >>>>>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I evaluate via: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \ >>>>>>>>>>>>>> --model >>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \ >>>>>>>>>>>>>> --traineddata >>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \ >>>>>>>>>>>>>> --eval_listfile >>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt >>>>>>>>>>>>>> 2>&1 | grep ± >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't get any OCR Line correctly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does someone see a mistake in my code? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to tesser...@googlegroups.com. >>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "tesseract-ocr" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to tesser...@googlegroups.com. >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> ____________________________________________________________ >>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesser...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesser...@googlegroups.com. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7dda8d63-7722-43d0-96fb-6cb385092773%40googlegroups.com.