Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Lstm training is not like legacy training. Please read the wiki pages regarding 4.0 training. I have given all sample commands there. There are 3 different ways of training. Read the bash scripts regarding training to know more. tesstrain.sh with --linedata-only creates the box tiff pairs but

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Sorry, I have given wrong commands for arabic. Actually i was referring to english. tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train unicharset_extractor eng.arial.exp4.box echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations about the font mftraining -F

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Arabic was never trained with the legacy tesseract engine and I doubt you will get any improvement over existing traineddata using cube or lstm. You are free to experiment and see what you come up with. I have pointed to the bash scripts for training. Please refer to them for the correct

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Hello shree, Thank you for your valuable reply.. Are there any changes i need to follow for the steps below.. I request you to suggest the changes for the below commands, these are for tess 3.0 tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train unicharset_extractor ara.arial.exp4.box

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh if ((LINEDATA)); then phase_E_extract_features "lstm.train" 8 "lstmf" make__lstmdata else phase_E_extract_features "box.train" 8 "tr" phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto" if

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Can you please tell, whether the command -> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train is right or not for tesseract 4. As it is producing .tr files when i give this command in tesseract 4. for image files training On Wednesday, April 12, 2017 at 2:19:24 PM UTC+5:30, shree

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread Ahmad Moawad
Thanks Shree for your reply I appreciate it, My intention: is that right path for training Tesseract 4.0 LSTM or not? On Wednesday, April 12, 2017 at 10:49:24 AM UTC+2, shree wrote: > > Read the bash scripts in > > tesstrain.sh > tesstrain_utils.sh > language_specific.sh > > In training

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Read the bash scripts in tesstrain.sh tesstrain_utils.sh language_specific.sh In training directory To understand more detail about lstm training - excuse the brevity, sent from mobile On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote: > this is the part from

[tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-11 Thread Ahmad Moawad
this is the part from https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 My question related to the image part not making training from text The overall training process is similar to training 3.04