[tesseract-ocr] Re: Creating Starter Traineddata

Simon Fri, 19 Jan 2024 06:27:20 -0800

Ok somehow I had "no entry point found" errors in the dll files. 
Reinstallation of Tesseract solved the Problem.


Now I encounter another interesting Problem. 

combine_lang_model --input_unicharset 
C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset 
--script_dir 
C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang 
--output_dir C:/Users/LCAdmin/Documents/FineTuning/output

When I run this command Tesseract tries to load many unicharsets. I don't 
understand why it tries to. It doesn't make any sense to me.
Whats the reason for loading all these unicharsets:

Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Latin.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Inherited.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Unknown.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Greek.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Armenian.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Arabic.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Devanagari.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Gujarati.unicharset
Failed to load script unicharset 
from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Bopomofo.unicharset

when I only want to train the english model? 

Also another question arised: 
When I try to train some new characters do I have to add them to the 
Latin.unicharset before I create the starter traineddata or do I just add 
these characters to the created unicharset after I created starter 
traineddata?

Simon schrieb am Freitag, 19. Januar 2024 um 10:38:24 UTC+1:

> Here is a link to the Website of Uni Mannheim: COMBINE_LANG_MODEL - 
> generate starter traineddata 
> <https://digi.bib.uni-mannheim.de/tesseract/manuals/combine_lang_model.1.html>
>
> Unfortunately the command doesn't create any files and after running the 
> command I don't get any Feedback on why the command didn't work properly. 
> Even when I porposely use non existent paths I still get no error message!
>
> PS C:\Windows\system32> combine_lang_model --input_unicharset 
> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset 
> --script_dir 
> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng  --lang eng 
> --wordlist 
> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/eng.wordlist 
> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output
> PS C:\Users\LCAdmin\Documents\FineTuning>
>
> PS C:\Users\LCAdmin\Documents\FineTuning> combine_lang_model 
> --input_unicharset tesstutorial/langdata/Latin.unicharset --script_dir 
> tesstutorial/langdata/eng  --lang eng --wordlist 
> asdfasfdef/langdata/eng/eng.wordlist --output_dir output
> PS C:\Users\LCAdmin\Documents\FineTuning>
>
> Does anyone have an idea how I can get insights in some log messages or 
> something that could give me more insights on why it didn't work?
>
>
>
> Simon schrieb am Donnerstag, 18. Januar 2024 um 11:11:52 UTC+1:
>
>> Hello everybody,
>>
>> I have a question regarding "Fine Tuning +- a few characters". 
>>
>> In general the instructions on 
>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters
>>  
>> say that you have to make a starter traineddata from the unicharset, but is 
>> this also required if I want to fine tune? 
>>
>> Furthermore I have absolutely no idea how I can create a starter 
>> traineddata. I read the "creating starter traineddata" chapter but I have 
>> absolutely no clue how I do that. This site is supposed to be a tutorial, 
>> therefore I expect a step for step instruction. 
>>
>> Can anyone help me with this?
>>
>> I am a newby at tersseract training, so I would appreciate any help.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com.

[tesseract-ocr] Re: Creating Starter Traineddata

Reply via email to