Arabic was never trained with the legacy tesseract engine and I doubt you will get any improvement over existing traineddata using cube or lstm.
You are free to experiment and see what you come up with. I have pointed to the bash scripts for training. Please refer to them for the correct process. - excuse the brevity, sent from mobile On 12-Apr-2017 4:00 PM, <srns...@gmail.com> wrote: > Hello shree, Thank you for your valuable reply.. Are there any changes i > need to follow for the steps below.. I request you to suggest the changes > for the below commands, these are for tess 3.0 > > tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train > unicharset_extractor ara.arial.exp4.box > echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations > about the font > mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial. > exp4.tr > shapeclustering -F unicharset ara.arial.exp4.tr > cntraining ara.arial.exp4.tr > > mv inttemp ara.inttemp > mv normproto ara.normproto > mv pffmtable ara.pffmtable > mv shapetable ara.shapetable > combine_tessdata ara. > > > Please suggest changes for the above steps. I plan to publish a rigorous > explanative tutorial after getting overview of all the steps. > Thank you. > > > On Wednesday, April 12, 2017 at 3:38:11 PM UTC+5:30, shree wrote: >> >> see https://github.com/tesseract-ocr/tesseract/blob/master/ >> training/tesstrain.sh >> >> >> if ((LINEDATA)); then >> phase_E_extract_features "lstm.train" 8 "lstmf" >> make__lstmdata >> else >> phase_E_extract_features "box.train" 8 "tr" >> phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto" >> if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then >> phase_S_cluster_shapes >> fi >> phase_M_cluster_microfeatures >> phase_B_generate_ambiguities >> make__traineddata >> fi >> >> -------------------- >> >> lstm.train is for LSTM training >> >> box.train is for 3.0 Tesseract legacy engine training >> >> Please note that current master code is for alpha testing for 4.0 LSTM >> and will most probably drop support for legacy engine. >> >> If you want the legacy tesseract engine and train for it, please use the >> 3.05 branch of the github repo. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU4vx2rg0KdYqnxUjyhgJd4W1028P9S-5kK5S5OH77G9g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.