Hi, I use this :
tesseract 4.0.0-beta.4
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8

 Found AVX2
 Found AVX
 Found SSE
I've trained about 18000 line for persian language. I use this command:

bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas    
--training_text  
 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt
 
--wordlist 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt
  
--linedata_only \
  --noextract_font_properties --langdata_dir 
/home/zohreh/Desktop/tesseract-master/src/training/langdata \
  --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
  --fontlist "Arial" --output_dir 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
and then run this:
sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining   \
  --traineddata 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata
  
 --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
  --model_output 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base 
--learning_rate 0.001 \
  --train_listfile 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt
 
\
  --eval_listfile 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
 
\
  --max_iterations 5000 
&>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
but always show Compute CTC targets failed and the model is not well at all.
I normal my text and each line of the text have 20 token(max).
Could you pleas help me?
 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to