I know, actually I am master in lstm. I want to resolve all error and then train big text. By version alpha, I trained about 1000 line and it is not so bad. But in version beta 4 I got many error. In alpha, # Use LSTM tessedit_ocr_engine_mode 1 tessedit_pageseg_mode 6
# Arabic page layout variables segment_nonalphabetic_script 1 # Avoid dropping rows textord_noise_rowratio 20.0 textord_noise_syfract 0.6 textord_min_linesize 2.5 # Avoid over-estimating intra-word spacing at both row and # block levels when using old to method tosp_old_to_method T tosp_old_to_constrain_sp_kn T tosp_old_sp_kn_th_factor 4.0 tosp_only_small_gaps_for_kern T tosp_use_pre_chopping T I used all these, but now my model doesn't learn. Has any thing changed in beta 4 for example text2image? On Wed, Sep 26, 2018 at 12:53 AM Shree Devi Kumar <shreesh...@gmail.com> wrote: > --fontlist "Arial" > > Does that have good coverage for Farsi? > > > --max_iterations 5000 > > You are trying to train from scratch with 18000 lines of text and only > 5000 iterations. That will not work. > > Ray has trained on hundreds of thousands of lines of text and millions of > iterations. > > On Tue, 25 Sep 2018, 16:20 Zohreh Khosrobeygi, <beigy.zoh...@gmail.com> > wrote: > >> Hi, I use this : >> tesseract 4.0.0-beta.4 >> leptonica-1.74.4 >> libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib >> 1.2.8 >> >> Found AVX2 >> Found AVX >> Found SSE >> I've trained about 18000 line for persian language. I use this command: >> >> bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas >> --training_text >> >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt >> --wordlist >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt >> --linedata_only \ >> --noextract_font_properties --langdata_dir >> /home/zohreh/Desktop/tesseract-master/src/training/langdata \ >> --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \ >> --fontlist "Arial" --output_dir >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2 >> and then run this: >> sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining \ >> --traineddata >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata >> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \ >> --model_output >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base >> --learning_rate 0.001 \ >> --train_listfile >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt >> \ >> --eval_listfile >> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt >> \ >> --max_iterations 5000 >> &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log >> but always show Compute CTC targets failed and the model is not well at >> all. >> I normal my text and each line of the text have 20 token(max). >> Could you pleas help me? >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/hGQMuZip6io/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcjmoC%2BfvY5qvn3e4PBVMhBFiEGDGP9WCkEUnsygQTpw%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcjmoC%2BfvY5qvn3e4PBVMhBFiEGDGP9WCkEUnsygQTpw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Zohreh Khosrobeygi University of Tehran, 2016 Tel: +989196042887 khosrobeygi.zo...@ut.ac.ir <khosrobeygi.zoh...@ut.ac.ir> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgxi-B-N7K32SzHtaxoQFQiYLVA%3Du65V6stVG3vPEJmMRw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.