[tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

이경준 Thu, 01 Mar 2018 04:56:05 -0800

And additonal question 

combine_tessdata -u kor.traineddata


What is that "-u" what is that meaning ?? 

I can not find that option(flag) .. wiki - github page

Could you give me a explanation

2018년 2월 28일 수요일 오후 4시 21분 17초 UTC+9, 이경준 님의 말:
>
> Hi I'm studying this passage. But I cannot understand  what is that 
> meaning flag "--noextract_font_properties" ? . so I saw the file 
> /tesseract/training/tesstrain.sh  
>
> But I cannot Find "--noextract_font_properites"
>
> Here usage : 
>
> # USAGE:
> #
> # tesstrain.sh
> #    --fontlist FONTS           # A list of fontnames to train on.
> #    --fonts_dir FONTS_PATH     # Path to font files.
> #    --lang LANG_CODE           # ISO 639 code.
> #    --langdata_dir DATADIR     # Path to tesseract/training/langdata 
> directory.
> #    --output_dir OUTPUTDIR     # Location of output traineddata file.
> #    --overwrite                # Safe to overwrite files in output_dir.
> #    --linedata_only            # Only generate training data for 
> lstmtraining.
> #    --run_shape_clustering     # Run shape clustering (use for Indic 
> langs).
> #    --exposures EXPOSURES      # A list of exposure levels to use (e.g. 
> "-1 0 1").
> #
> # OPTIONAL flags for input data. If unspecified we will look for them in
> # the langdata_dir directory.
> #    --training_text TEXTFILE   # Text to render and use for training.
> #    --wordlist WORDFILE        # Word list for the language ordered by
> #                               # decreasing frequency.
> #
> # OPTIONAL flag to specify location of existing traineddata files, required
> # during feature extraction. If unspecified will use TESSDATA_PREFIX 
> defined in
> # the current environment.
> #    --tessdata_dir TESSDATADIR     # Path to tesseract/tessdata directory.
> #
> # NOTE:
> # The font names specified in --fontlist need to be recognizable by Pango 
> using
> # fontconfig. An easy way to list the canonical names of all fonts 
> available on
> # your system is to run text2image with --list_available_fonts and the
> # appropriate --fonts_dir path.
>
>
>
>
>
>
> Using tesstrain
>
> The setup for running tesstrain.sh 
> <https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh>
>  is 
> the same as for base Tesseract. Use --linedata_onlyoption for LSTM 
> training. Note that it is beneficial to have more training text and make 
> more pages though, as neural nets don't generalize as well and need to 
> train on something similar to what they will be running on. If the target 
> domain is severely limited, then all the dire warnings about needing a lot 
> of training data may not apply, but the network specification may need to 
> be changed.
>
> Training data is created using tesstrain.sh 
> <https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh>
>  as 
> follows: Note that your fonts location may vary.
>
> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only 
> \
>   --noextract_font_properties --langdata_dir ../langdata \
>   --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain
>
>
>
> Thank U Very much . I want to reply Everybody
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/97c9dc09-68bd-4c7f-ad2a-4455109d4d6d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

Reply via email to