2018. március 1., csütörtök 5:02:00 UTC+1 időpontban 이경준 a következőt írta: > > Hi > > I have a question about making a traineedata (tesseract 4.0 LSTM) > > Tutorial Guide to lstmtraining > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-starter-traineddata>Creating > > Starter Traineddata > > NOTE: This is a new step! > > Instead of a unicharset and script_dir, lstmtraining now takes a > traineddata file on its command-line, to obtain all the information it > needs on the language to be learned. The traineddata *must* contain at > least an lstm-unicharset and lstm-recoder component, and may also contain > the three dawg files: lstm-punc-dawg lstm-word-dawg lstm-number-dawg A > config file is also optional. The other components, if present, will be > ignored and unused. > > There is no tool to create the lstm-recoder directly. Instead there is a > new tool, combine_lang_model which takes as input an input_unicharset and > script_dir(script_dir points to the langdata directory) and optional word > list files. It creates the lstm-recoder from the input_unicharset and > creates all the dawgs, if wordlists are provided, putting everything > together into a traineddata file. > > > > > above the passage I could not find to make a 'lstm-unicharset' ....... So > I have no idea > > > and. I have a question > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 > > > NOTE Tesseract 4.00 will now run happily with a traineddata file that > contains *just* lang.lstm, lang.lstm-unicharset and lang.lstm-recoder. > The lstm-*-dawgs are optional, and *none of the other components are > required or used with OEM_LSTM_ONLY as the OCR engine mode.* No bigrams, > unichar ambigs or any of the other components are needed or even have any > effect if present. The only other component that does anything is the > lang.config, which can affect layout analysis, and sub-languages. > > If added to an existing Tesseract traineddata file, the lstm-unicharset > doesn't > have to match the Tesseract unicharset, but the same unicharset must be > used to train the LSTM and build the lstm-*-dawgs files. > > > > > at the end of this wiki passage, trainned data is composed by 'lang.lstm, > lang.lstm-unicharset, lang.lstm-recoder'(mandatory) / > > > > but firstl `Creating Starter Traineddtat' passage says that trainned data > is composed by 'lstm-recoder, lstm-unicharset(mandatory) / > > > > Which is sentence is right? > > > plz help me..... > > > >
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f4c6558f-a88f-4661-b0fa-a1400282751e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

