Can all used fonts render Ø?

On Thu, Oct 3, 2019 at 7:59 PM Dustin Theobald <d.theo1...@gmail.com> wrote:

> I also tried to change the training-text with respect to Ø:
>
> cat <<EOM >>../langdata/eng/eng.plusminus.training_text
> alkoxy of LEAVES Ø1.84% by Buying curved RESISTANCE MARKED Your (Vol.
> SPANIEL
> TRAVELED Ø85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom
> Leadership
> Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's Ø1.31 POPSET
> Os—C(11)
> VOLVO abdomen, Ø65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri
> PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, Ø14¢ Flujo Gilbert
> Dresdner Yesterday's Dilated SYSTEMS Your FOUR Ø90° Gogol PARTIALLY BOARDS
> firm
> Email ACTUAL QUEENSLAND Carl's Unruly Ø8.4 DESTRUCTION customers DataVac®
> DAY
> Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY Ø2.96% Ask! WELL
> Lambert own Company View mg \ (Ø7) SENSOR STUDYING Feb EVENTUALLY [It
> Yahoo! Tv
> United by #DEFINE Rebel PERFORMED Ø500Gb Oliver Forums Many | ©2003-2008
> Used OF
> Avoidance Moosejaw pm* Ø18 note: PROBE Jailbroken RAISE Fountains Write
> Goods (Ø6)
> Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § Ø44.01189673355 €
> netting Bookmark of WE MORE) STRENGTH IDENTICAL Ø2? activity PROPERTY
> MAINTAINED
> EOM
>
> The evaluation on the training data works, but he doesn't recognize any
> Line in the evalplusminus/eng.training_files.txt
>
> Am Donnerstag, 3. Oktober 2019 13:59:19 UTC+2 schrieb Dustin Theobald:
>>
>> Thank you Shree,
>>
>> Im left with URW Bookman and Century Schoolbook family (which it seems I
>> have to pay for).
>> For now I'll stick to the linux. Still, thank you very much shree!
>>
>> I have one more question regarding training:
>>
>> I have German and Englisch PDFs (sometimes mixed). I can use multiple
>> languages (deu+eng). If I finetune for a character, do I have to finetune
>> both language models, eng.lstm + deu.lstm and combine them when using
>> tesseract, like:
>>
>> tesseract ~/Desktop/test.png stdout -l eng_plusminus+deu_plusminus \
>> --oem 1 \
>> --psm 3 \
>> --tessdata-dir ./tesseract/tessdata/best
>>
>> Thank you in advance!
>>
>> Cheers,
>> Dustin
>>
>> Am Donnerstag, 3. Oktober 2019 10:34:53 UTC+2 schrieb shree:
>>>
>>>
>>> https://apple.stackexchange.com/questions/128091/where-can-i-find-default-microsoft-fonts-calibri-cambria
>>>
>>>
>>> On Thu, Oct 3, 2019 at 1:33 PM Dustin Theobald <d.th...@gmail.com>
>>> wrote:
>>>
>>>> Ok. Thank you very much for your help! I'll get it to work somehow!
>>>>
>>>> Cheers,
>>>> Dustin
>>>>
>>>> Am Mittwoch, 2. Oktober 2019 16:46:25 UTC+2 schrieb shree:
>>>>>
>>>>> Sorry, don't know how to add those fonts for Mac.
>>>>>
>>>>> The tutorial uses the following set of fonts:
>>>>>
>>>>> https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh#L42
>>>>>
>>>>>
>>>>> You could use a similar set of fonts available on the Mac and assign
>>>>> via fontlist.
>>>>>
>>>>> On Wed, Oct 2, 2019 at 7:38 PM Dustin Theobald <d.th...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey shree,
>>>>>>
>>>>>> do you know how to manually install the missing fonts for MAC, like
>>>>>> in your docu for linux:
>>>>>>
>>>>>> sudo apt update
>>>>>> sudo apt install ttf-mscorefonts-installer
>>>>>> sudo apt install fonts-dejavu
>>>>>> fc-cache -vf
>>>>>>
>>>>>> Thank you in advance!
>>>>>>
>>>>>> Best regards,
>>>>>> Dustin
>>>>>>
>>>>>> Am Mittwoch, 2. Oktober 2019 11:26:28 UTC+2 schrieb shree:
>>>>>>>
>>>>>>> >This doesn't work on my MAC. I can't find some of the fonts, so I
>>>>>>> only try to create trainingdata for Arial, if use the
>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), which
>>>>>>> seems odd.
>>>>>>>
>>>>>>> 2 pages should be ok because it uses the training_text from langdata
>>>>>>> repo which is around 80 lines plus the extra lines added with plusminus.
>>>>>>>
>>>>>>> On Wed, Oct 2, 2019 at 2:53 PM Shree Devi Kumar <shree...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> 1. You could install on linux using the appropriate package from
>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki#tesseract-4-packages-with-lstm-engine-and-related-traineddata
>>>>>>>>
>>>>>>>> OR
>>>>>>>>
>>>>>>>> 2. When building tesseract from git source, follow
>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation#build-with-training-tools
>>>>>>>>
>>>>>>>> You seem to be missing some steps there.
>>>>>>>>
>>>>>>>> On Wed, Oct 2, 2019 at 2:32 PM Dustin Theobald <d.th...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey Shree,
>>>>>>>>>
>>>>>>>>> Thank you for your help!
>>>>>>>>>
>>>>>>>>> This doesn't work on my MAC. I can't find some of the fonts, so I
>>>>>>>>> only try to create trainingdata for Arial, if use the
>>>>>>>>> 5-makedata-plusminus.sh, he is only rendering (creating 2 pages), 
>>>>>>>>> which
>>>>>>>>> seems odd.
>>>>>>>>>
>>>>>>>>> I'm switching to my linux now, but I have problems installing
>>>>>>>>> tesseract.
>>>>>>>>>
>>>>>>>>> I'm following the documentation:
>>>>>>>>>
>>>>>>>>> sudo apt install tesseract-ocr
>>>>>>>>>
>>>>>>>>> After, I try to find the folder to run
>>>>>>>>>
>>>>>>>>> make
>>>>>>>>> make training
>>>>>>>>> make training-install
>>>>>>>>>
>>>>>>>>>  But I cannot find the folder (on ubuntu)
>>>>>>>>>
>>>>>>>>> So, I clone the GitHub Repository:
>>>>>>>>> https://github.com/tesseract-ocr/tesseract
>>>>>>>>> to my Desktop and run ./autogen.sh ./configure, make, make
>>>>>>>>> training, sudo make trainng-install
>>>>>>>>>
>>>>>>>>> But then I'll get the following error when running
>>>>>>>>> 5-makedata-plusminus.sh:
>>>>>>>>>
>>>>>>>>> /usr/local/bin/text2image: error while loading shared libraries:
>>>>>>>>> libtesseract.so.5: cannot open shared object file: No such file or 
>>>>>>>>> directory
>>>>>>>>> ERROR: Program text2image failed. Abort.
>>>>>>>>>
>>>>>>>>> Thank you very much for your help!
>>>>>>>>>
>>>>>>>>> Am Dienstag, 1. Oktober 2019 17:41:36 UTC+2 schrieb shree:
>>>>>>>>>>
>>>>>>>>>> specifically
>>>>>>>>>> https://github.com/Shreeshrii/tess4training/blob/master/6-plusminus.log#L429
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 1, 2019 at 9:09 PM Shree Devi Kumar <
>>>>>>>>>> shree...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> See https://github.com/Shreeshrii/tess4training
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 1, 2019 at 7:53 PM Dustin Theobald <
>>>>>>>>>>> d.th...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Changed my evaluation to:
>>>>>>>>>>>>
>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>>>   --model ~/Desktop/tesstutorial/trainplusminus/
>>>>>>>>>>>> *plusminus_checkpoint* \
>>>>>>>>>>>>   --traineddata
>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>>>   --eval_listfile
>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 
>>>>>>>>>>>> | grep ±
>>>>>>>>>>>>
>>>>>>>>>>>> Still doesn't work.
>>>>>>>>>>>>
>>>>>>>>>>>> Am Dienstag, 1. Oktober 2019 14:39:48 UTC+2 schrieb Dustin
>>>>>>>>>>>> Theobald:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a Problem when Finetuning Characters (trying the ± approach
>>>>>>>>>>>>> on
>>>>>>>>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>>>>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>> (I'm working on a MAC)
>>>>>>>>>>>>>
>>>>>>>>>>>>> My tesseract version:
>>>>>>>>>>>>>
>>>>>>>>>>>>> tesseract 5.0.0-alpha-457-gb3b74
>>>>>>>>>>>>>
>>>>>>>>>>>>>  leptonica-1.78.0
>>>>>>>>>>>>>
>>>>>>>>>>>>>   libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 :
>>>>>>>>>>>>> zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Found AVX2
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Found AVX
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Found FMA
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Found SSE
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>>>>>>>>>>>>
>>>>>>>>>>>>> My bashscript looks at follows: https://pastebin.com/XK4CkuM2
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I evaluate via:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~/../../usr/local/bin/lstmeval \
>>>>>>>>>>>>>   --model
>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.traineddata \
>>>>>>>>>>>>>   --traineddata
>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng/eng.traineddata \
>>>>>>>>>>>>>   --eval_listfile
>>>>>>>>>>>>> ~/Desktop/tesstutorial/trainplusminus/eng.training_files.txt 2>&1 
>>>>>>>>>>>>> | grep ±
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't get any OCR Line correctly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does someone see a mistake in my code?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>>> Google Groups "tesseract-ocr" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from
>>>>>>>>>>>> it, send an email to tesser...@googlegroups.com.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e9ba2635-6308-41a8-8150-e5d4da520269%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to tesser...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/d44cd443-da72-4df4-9a7c-aae082726010%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ____________________________________________________________
>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to tesser...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/0a2e9693-553a-4340-832d-79a31da74314%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesser...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/ca6dd8f3-27d1-4ab5-bfe1-45011e63223e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/2bbf0e65-785d-4847-bb24-dcfa197a45a8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWTX9is5_smoUsnd57woCLcqO0h1U%2Bi741t97JSH%3Di30Q%40mail.gmail.com.

Reply via email to