Thank you Shree, 

Im left with URW Bookman and Century Schoolbook family (which it seems I 
have to pay for). 
For now I'll stick to the linux. Still, thank you very much shree!

I have one more question regarding training: 

I have German and Englisch PDFs (sometimes mixed). I can use multiple 
languages (deu+eng). If I finetune for a character, do I have to finetune 
both language models, eng.lstm + deu.lstm and combine them when using 
tesseract, like: 

tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \
--oem 1 \
--psm 3 \
--tessdata-dir ./tesseract/tessdata/best

Thank you in advance!

Cheers, 
Dustin

Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree:
>
>
> https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria
>   
>
> On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com 
> <javascript:>> wrote:
>
>> Ok. Thank you very much for your help! I'll get it to work somehow! 
>>
>> Cheers,
>> Dustin
>>
>> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree:
>>>
>>> Sorry, don't know how to add those fonts for Mac.
>>>
>>> The tutorial uses the following set of fonts:
>>>
>>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42
>>>  
>>>
>>> You could use a similar set of fonts available on the Mac and assign via 
>>> fontlist. 
>>>
>>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com> 
>>> wrote:
>>>
>>>> Hey shree, 
>>>>
>>>> do you know how to manually install the missing fonts for MAC, like in 
>>>> your docu for linux: 
>>>>
>>>> sudo apt update
>>>> sudo apt install ttf-mscorefonts-installer
>>>> sudo apt install fonts-dejavu
>>>> fc-cache -vf
>>>>
>>>> Thank you in advance!
>>>>
>>>> Best regards,
>>>> Dustin
>>>>
>>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree:
>>>>>
>>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I 
>>>>> only try to create trainingdata for Arial, if use the 
>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which 
>>>>> seems odd.
>>>>>
>>>>> 2 pages should be ok because it uses the training_text from langdata 
>>>>> repo which is around 80 lines plus the extra lines added with plusminus.
>>>>>
>>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> 1. You could install on linux using the appropriate package from 
>>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata
>>>>>>
>>>>>> OR
>>>>>>
>>>>>> 2. When building tesseract from git source, follow 
>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools
>>>>>>
>>>>>> You seem to be missing some steps there.
>>>>>>
>>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Shree, 
>>>>>>>
>>>>>>> Thank you for your help!
>>>>>>>
>>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I 
>>>>>>> only try to create trainingdata for Arial, if use the 
>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which 
>>>>>>> seems odd.
>>>>>>>
>>>>>>> I'm switching to my linux now, but I have problems installing 
>>>>>>> tesseract. 
>>>>>>>
>>>>>>> I'm following the documentation:
>>>>>>>
>>>>>>> sudo apt install tesseract-ocr
>>>>>>>
>>>>>>> After, I try to find the folder to run 
>>>>>>>
>>>>>>> make 
>>>>>>> make training 
>>>>>>> make training-install
>>>>>>>
>>>>>>>  But I cannot find the folder (on ubuntu)
>>>>>>>
>>>>>>> So, I clone the GitHub Repository: 
>>>>>>> https://github.com/tesseract-ocr/tesseract
>>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make training, 
>>>>>>> sudo make trainng-install
>>>>>>>
>>>>>>> But then I'll get the following error when running 
>>>>>>> 5-makedata-plusminus.sh:
>>>>>>>
>>>>>>> /usr/local/bin/text2image: error while loading shared libraries: 
>>>>>>> libtesseract.so.5: cannot open shared object file: No such file or 
>>>>>>> directory
>>>>>>> ERROR: Program text2image failed. Abort.
>>>>>>>
>>>>>>> Thank you very much for your help!
>>>>>>>
>>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree:
>>>>>>>>
>>>>>>>> specifically 
>>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429
>>>>>>>>
>>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar <shree...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> See https://github.com/Shreeshrii/tess4training
>>>>>>>>>
>>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald <d.th...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Changed my evaluation to: 
>>>>>>>>>>
>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>   --model ~/Desktop/tesstutorial/trainplusminus/
>>>>>>>>>> *plusminus_checkpoint* \
>>>>>>>>>>   --traineddata 
>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>   --eval_listfile 
>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 | 
>>>>>>>>>> grep ±
>>>>>>>>>>
>>>>>>>>>> Still doesn't work.
>>>>>>>>>>
>>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin 
>>>>>>>>>> Theobald:
>>>>>>>>>>>
>>>>>>>>>>> Hey guys, 
>>>>>>>>>>>
>>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± approach 
>>>>>>>>>>> on 
>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>> (I'm working on a MAC)
>>>>>>>>>>>
>>>>>>>>>>> My tesseract version: 
>>>>>>>>>>>
>>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74
>>>>>>>>>>>
>>>>>>>>>>>  leptonica-1.78.0
>>>>>>>>>>>
>>>>>>>>>>>   libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : 
>>>>>>>>>>> zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
>>>>>>>>>>>
>>>>>>>>>>>  Found AVX2
>>>>>>>>>>>
>>>>>>>>>>>  Found AVX
>>>>>>>>>>>
>>>>>>>>>>>  Found FMA
>>>>>>>>>>>
>>>>>>>>>>>  Found SSE
>>>>>>>>>>>
>>>>>>>>>>>  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>>>>>>>>>>
>>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2
>>>>>>>>>>>
>>>>>>>>>>> When I evaluate via: 
>>>>>>>>>>>
>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>>   --model ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \
>>>>>>>>>>>   --traineddata 
>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>>   --eval_listfile 
>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 | 
>>>>>>>>>>> grep ±
>>>>>>>>>>>
>>>>>>>>>>> I don't get any OCR Line correctly. 
>>>>>>>>>>>
>>>>>>>>>>> Does someone see a mistake in my code? 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com
>>>>>>>>>>  
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> -- 
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/fe88f4ca-21f8-4b7e-8ae7-fca515fb1dee%40googlegroups.com.

Reply via email to