Ok, when I run make_training_data, it says "Other case ø of  Ø is not in 
unicharset", might this be a problem? Even though  Ø is in the unicharset?

Cheers,
Dustin

Am Donnerstag, 3. Oktober 2019 16:52:46 UTC+2 schrieb shree:
>
> Can all used fonts render Ø? 
>
> On Thu, Oct 3, 2019 at 7:59 PM Dustin Theobald <d.th...@gmail.com 
> <javascript:>> wrote:
>
>> I also tried to change the training-text with respect to Ø: 
>>
>> cat <<EOM >>../langdata/eng/eng.plusminus.training_text
>> alkoxy of LEAVES Ø1.84% by Buying curved RESISTANCE MARKED Your (Vol. 
>> SPANIEL
>> TRAVELED Ø85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom 
>> Leadership
>> Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's Ø1.31 POPSET 
>> Os—C(11)
>> VOLVO abdomen, Ø65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri
>> PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, Ø14¢ Flujo Gilbert
>> Dresdner Yesterday's Dilated SYSTEMS Your FOUR Ø90° Gogol PARTIALLY 
>> BOARDS firm
>> Email ACTUAL QUEENSLAND Carl's Unruly Ø8.4 DESTRUCTION customers DataVac® 
>> DAY
>> Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY Ø2.96% Ask! WELL
>> Lambert own Company View mg \ (Ø7) SENSOR STUDYING Feb EVENTUALLY [It 
>> Yahoo! Tv
>> United by #DEFINE Rebel PERFORMED Ø500Gb Oliver Forums Many | ©2003-2008 
>> Used OF
>> Avoidance Moosejaw pm* Ø18 note: PROBE Jailbroken RAISE Fountains Write 
>> Goods (Ø6)
>> Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § Ø44.01189673355 €
>> netting Bookmark of WE MORE) STRENGTH IDENTICAL Ø2? activity PROPERTY 
>> MAINTAINED
>> EOM
>>
>> The evaluation on the training data works, but he doesn't recognize any 
>> Line in the evalplusminus/eng.training_files.txt
>>
>> Am Donnerstag, 3. Oktober 2019 13:59:19 UTC+2 schrieb Dustin Theobald:
>>>
>>> Thank you Shree, 
>>>
>>> Im left with URW Bookman and Century Schoolbook family (which it seems I 
>>> have to pay for). 
>>> For now I'll stick to the linux. Still, thank you very much shree!
>>>
>>> I have one more question regarding training: 
>>>
>>> I have German and Englisch PDFs (sometimes mixed). I can use multiple 
>>> languages (deu+eng). If I finetune for a character, do I have to finetune 
>>> both language models, eng.lstm + deu.lstm and combine them when using 
>>> tesseract, like: 
>>>
>>> tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \
>>> --oem 1 \
>>> --psm 3 \
>>> --tessdata-dir ./tesseract/tessdata/best
>>>
>>> Thank you in advance!
>>>
>>> Cheers, 
>>> Dustin
>>>
>>> Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree:
>>>>
>>>>
>>>> https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria
>>>>   
>>>>
>>>> On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com> 
>>>> wrote:
>>>>
>>>>> Ok. Thank you very much for your help! I'll get it to work somehow! 
>>>>>
>>>>> Cheers,
>>>>> Dustin
>>>>>
>>>>> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree:
>>>>>>
>>>>>> Sorry, don't know how to add those fonts for Mac.
>>>>>>
>>>>>> The tutorial uses the following set of fonts:
>>>>>>
>>>>>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42
>>>>>>  
>>>>>>
>>>>>> You could use a similar set of fonts available on the Mac and assign 
>>>>>> via fontlist. 
>>>>>>
>>>>>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hey shree, 
>>>>>>>
>>>>>>> do you know how to manually install the missing fonts for MAC, like 
>>>>>>> in your docu for linux: 
>>>>>>>
>>>>>>> sudo apt update
>>>>>>> sudo apt install ttf-mscorefonts-installer
>>>>>>> sudo apt install fonts-dejavu
>>>>>>> fc-cache -vf
>>>>>>>
>>>>>>> Thank you in advance!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Dustin
>>>>>>>
>>>>>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree:
>>>>>>>>
>>>>>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I 
>>>>>>>> only try to create trainingdata for Arial, if use the 
>>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), 
>>>>>>>> which 
>>>>>>>> seems odd.
>>>>>>>>
>>>>>>>> 2 pages should be ok because it uses the training_text from 
>>>>>>>> langdata repo which is around 80 lines plus the extra lines added with 
>>>>>>>> plusminus.
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> 1. You could install on linux using the appropriate package from 
>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata
>>>>>>>>>
>>>>>>>>> OR
>>>>>>>>>
>>>>>>>>> 2. When building tesseract from git source, follow 
>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools
>>>>>>>>>
>>>>>>>>> You seem to be missing some steps there.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Shree, 
>>>>>>>>>>
>>>>>>>>>> Thank you for your help!
>>>>>>>>>>
>>>>>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I 
>>>>>>>>>> only try to create trainingdata for Arial, if use the 
>>>>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), 
>>>>>>>>>> which 
>>>>>>>>>> seems odd.
>>>>>>>>>>
>>>>>>>>>> I'm switching to my linux now, but I have problems installing 
>>>>>>>>>> tesseract. 
>>>>>>>>>>
>>>>>>>>>> I'm following the documentation:
>>>>>>>>>>
>>>>>>>>>> sudo apt install tesseract-ocr
>>>>>>>>>>
>>>>>>>>>> After, I try to find the folder to run 
>>>>>>>>>>
>>>>>>>>>> make 
>>>>>>>>>> make training 
>>>>>>>>>> make training-install
>>>>>>>>>>
>>>>>>>>>>  But I cannot find the folder (on ubuntu)
>>>>>>>>>>
>>>>>>>>>> So, I clone the GitHub Repository: 
>>>>>>>>>> https://github.com/tesseract-ocr/tesseract
>>>>>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make 
>>>>>>>>>> training, sudo make trainng-install
>>>>>>>>>>
>>>>>>>>>> But then I'll get the following error when running 
>>>>>>>>>> 5-makedata-plusminus.sh:
>>>>>>>>>>
>>>>>>>>>> /usr/local/bin/text2image: error while loading shared libraries: 
>>>>>>>>>> libtesseract.so.5: cannot open shared object file: No such file or 
>>>>>>>>>> directory
>>>>>>>>>> ERROR: Program text2image failed. Abort.
>>>>>>>>>>
>>>>>>>>>> Thank you very much for your help!
>>>>>>>>>>
>>>>>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree:
>>>>>>>>>>>
>>>>>>>>>>> specifically 
>>>>>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar <
>>>>>>>>>>> shree...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> See https://github.com/Shreeshrii/tess4training
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald <
>>>>>>>>>>>> d.th...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Changed my evaluation to: 
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>>>>   --model ~/Desktop/tesstutorial/trainplusminus/
>>>>>>>>>>>>> *plusminus_checkpoint* \
>>>>>>>>>>>>>   --traineddata 
>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>>>>   --eval_listfile 
>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 
>>>>>>>>>>>>> | grep ±
>>>>>>>>>>>>>
>>>>>>>>>>>>> Still doesn't work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin 
>>>>>>>>>>>>> Theobald:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey guys, 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± 
>>>>>>>>>>>>>> approach 
>>>>>>>>>>>>>> on 
>>>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I'm working on a MAC)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My tesseract version: 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  leptonica-1.78.0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 
>>>>>>>>>>>>>> : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Found AVX2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Found AVX
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Found FMA
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Found SSE
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When I evaluate via: 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>>>>>   --model 
>>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \
>>>>>>>>>>>>>>   --traineddata 
>>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>>>>>   --eval_listfile 
>>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 
>>>>>>>>>>>>>> 2>&1 | grep ±
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't get any OCR Line correctly. 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does someone see a mistake in my code? 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>> it, send an email to tesser...@googlegroups.com.
>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com
>>>>>>>>>>>>>  
>>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>>
>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com
>>>>>>>>>>  
>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>>
>>>>>>>>> ____________________________________________________________
>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to tesser...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>
>>>>
>>>> -- 
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7dda8d63-7722-43d0-96fb-6cb385092773%40googlegroups.com.

Reply via email to