[tesseract-ocr] Re: I have a question about making a traineddata (tesseract 4.0 LSTM)

Kristóf Horváth Wed, 30 Jan 2019 00:21:09 -0800


2018. március 1., csütörtök 5:02:00 UTC+1 időpontban 이경준 a következőt írta:
>
> Hi 
>
> I have a question about making a traineedata (tesseract 4.0 LSTM)
>
> Tutorial Guide to lstmtraining 
> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata>Creating
>  
> Starter Traineddata
>
> NOTE: This is a new step!
>
> Instead of a unicharset and script_dir, lstmtraining now takes a 
> traineddata file on its command-line, to obtain all the information it 
> needs on the language to be learned. The traineddata *must* contain at 
> least an lstm-unicharset and lstm-recoder component, and may also contain 
> the three dawg files: lstm-punc-dawg lstm-word-dawg lstm-number-dawg A 
> config file is also optional. The other components, if present, will be 
> ignored and unused.
>
> There is no tool to create the lstm-recoder directly. Instead there is a 
> new tool, combine_lang_model which takes as input an input_unicharset and 
> script_dir(script_dir points to the langdata directory) and optional word 
> list files. It creates the lstm-recoder from the input_unicharset and 
> creates all the dawgs, if wordlists are provided, putting everything 
> together into a traineddata file.
>
>
>
>
> above the passage  I could not find to make a 'lstm-unicharset' ....... So 
> I have no idea 
>
>
> and. I have a question 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 
>
>
> NOTE Tesseract 4.00 will now run happily with a traineddata file that 
> contains *just* lang.lstm, lang.lstm-unicharset and lang.lstm-recoder. 
> The lstm-*-dawgs are optional, and *none of the other components are 
> required or used with OEM_LSTM_ONLY as the OCR engine mode.* No bigrams, 
> unichar ambigs or any of the other components are needed or even have any 
> effect if present. The only other component that does anything is the 
> lang.config, which can affect layout analysis, and sub-languages.
>
> If added to an existing Tesseract traineddata file, the lstm-unicharset 
> doesn't 
> have to match the Tesseract unicharset, but the same unicharset must be 
> used to train the LSTM and build the lstm-*-dawgs files.
>
>
>
>
> at the end of this wiki passage, trainned data is composed by 'lang.lstm, 
> lang.lstm-unicharset, lang.lstm-recoder'(mandatory) /
>
>
>
> but firstl `Creating Starter Traineddtat' passage says that trainned data 
> is composed by 'lstm-recoder, lstm-unicharset(mandatory) /
>
>
>
> Which is sentence is right? 
>
>
> plz help me.....
>
>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f4c6558f-a88f-4661-b0fa-a1400282751e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: I have a question about making a traineddata (tesseract 4.0 LSTM)

Reply via email to