So, here's what i did, 1. i ran text2image with my training_text file text2image --text /home/mobeen/customtrain/langdata/ara/ara.training_text \ --outputbase /home/mobeen/customtrain/tiff-box/ara.Arial \ --fonts_dir /home/mobeen/Documents/fonts \ --font 'Arial' By this, i got tiff and box files as output. I removed the box file created by text2image as it is not in lstm format 2. Then I ran tesseract /home/mobeen/customtrain/tiff-box/ara.Arial.tif /home/mobeen/ customtrain/tiff-box/ara.Arial -l ara-new lstmbox this gave me the lstm format box file. 3. Next I opened this box file replaced all AEN with AWN and save the file. 4. Then i ran tesstrain using --my_boxtiff_dir argument, as follows: src/training/tesstrain.sh \ --fonts_dir /home/mobeen/Documents/fonts \ --lang ara --linedata_only --noextract_font_properties \ --langdata_dir ../langdata \ --tessdata_dir ./tessdata \ --output_dir ~/customtrain/aratrain \ --fontlist 'Arial' \ --my_boxtiff_dir /home/mobeen/customtrain/tiff-box this generated the lstmf file and gave me a starter traineddata file. 5. Next i ran, training/lstmtraining --debug_interval -1 \ --traineddata ~/customtrain/aratrain/ara/ara.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output ~/customtrain/araoutput/base --learning_rate 20e-4 \ --train_listfile ~/customtrain/aratrain/ara.training_files.txt \ --eval_listfile ~/customtrain/araeval/ara.training_files.txt \ --max_iterations 3600 &>~/customtrain/araoutput/basetrain.log In another tereminal window i ran, tail -f ~/customtrain/araoutput/basetrain.log Wich displayed this: File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 3 : Mean rms=0.585%, delta=0.957%, train=2.68%(4.53%), skip ratio=0% Iteration 3588: GROUND TRUTH : يف نأ ةفاضإ ١ مالفا و امك خيرات ٢ ةيسيئرلا ٣ مقر ٤ برعلا Iteration 3588: BEST OCR TEXT : يف نأ ةفشإ ١ مالا و امك خيراا ٢ ةيسيئرلا ٣ مقر ٤ برملا File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 4 : Mean rms=0.588%, delta=0.963%, train=2.691%(4.558%), skip ratio=0% Iteration 3589: GROUND TRUTH : ىدتنم ٨ نآلا دق ٥ مسق ٧ ةفاضإ _ ٦ عيقوتلا ٩ ةيبرعلا ىدتنم Iteration 3589: BEST OCR TEXT : ىدتنم ٥ نآلا هق ٥ مسا ٧ ةفاضإ _ ٦ عيقوتلا ٢ ةيبرعلا ىدتنم File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 5 : Mean rms=0.59%, delta=0.968%, train=2.705%(4.587%), skip ratio=0% Iteration 3590: GROUND TRUTH : ةيزمرلا ٦ ىلإ ٩ جماربلا ٨ ذنم ٥ ١ ىدتنملا ٧ نع ىدتنم Iteration 3590: BEST OCR TEXT : ةيزمرلا ١ ىلإ ٩ جماربلا ٨ انم ٥ ١ ىدتنسلا ٧ نع ىدتنم File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 6 : Mean rms=0.592%, delta=0.971%, train=2.717%(4.61%), skip ratio=0% Iteration 3591: GROUND TRUTH : هيف ٧ دمحأ ٩ ةيزمرلا ٣ دوك ٥ رورملا ١ حب هل ٦ ةفاك ٨ ماعلا ٣ يلع Iteration 3591: BEST OCR TEXT : هيف ٧ دمحأ ٣ ةيزمرلا ٣ دوك ٥ رورملا ٠ نب هل ٦ ةفا ٥ مسقا ٣ يلع File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 7 : Mean rms=0.594%, delta=0.976%, train=2.738%(4.643%), skip ratio=0% Iteration 3592: GROUND TRUTH : ىلعو ٧ نب ٦ ةكراشملا ٥ خيرات ٨ عيطتست ٩ ىلعألا Iteration 3592: BEST OCR TEXT : ىلاو ٧ نب ٩ ةكراشملا ٥ خيرقت ٨ عيقطتست ٩ ىلعأل File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 8 : Mean rms=0.596%, delta=0.979%, train=2.751%(4.689%), skip ratio=0% Iteration 3593: GROUND TRUTH : هيلع ٨ دئاصق ٦ لئاسرلا ٧ برغملا ٥ نيطسلف ١ يه ٣ ماظنلا ٩ تاكراشم Iteration 3593: BEST OCR TEXT : هيلع ٨ دئاضق ٩ لئاسرلا ٧ برتملا ٥ نيطسلفا ٢ يه ٣ ماظنلا ٩ تاكراشم File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 9 : Mean rms=0.599%, delta=0.984%, train=2.765%(4.722%), skip ratio=0% Iteration 3594: GROUND TRUTH : / ٩ ةديدج ٦ يذلا نإ ال ٧ سلجم ٩ هب ٠ ىلوألا ٥ روصلا ٨ لا راوزلا Iteration 3594: BEST OCR TEXT : / ٩ ةديدج ٦ يذلا نإ ال ٧ سدجم ٩ هب ٠ ىلوألا ٨ روصلا ٨ لا راولا File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 10 : Mean rms=0.601%, delta=0.987%, train=2.773%(4.739%), skip ratio=0% Iteration 3595: GROUND TRUTH : عيضاوم ٨ تاكراشم ٥ انب ٣ تانب ٧ رابخأ ٠ ىلع ٦ ريغ اذه دقو لكشب ٩ Iteration 3595: BEST OCR TEXT : عيضاوم ٨ تاكراشم ٥ انب ٣ تانب ٧ رايخأ ٠ ىلع ٦ ريغ اذه دقو لكشب ٩ File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 11 : Mean rms=0.602%, delta=0.988%, train=2.777%(4.744%), skip ratio=0% Iteration 3596: GROUND TRUTH : خيشلا ٩ ثحبلا ٨ رييغت ٦ نيب ١ مسا ءزجلا ٧ يف لالخ ٥ عوضوملا Iteration 3596: BEST OCR TEXT : خيللا ٩ ثحبلا ٨ ريغت ٦ نيب ١ مسا ءزجلا ٧ يف لالخ ٥ عوضوملا File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 12 : Mean rms=0.603%, delta=0.99%, train=2.782%(4.758%), skip ratio=0% Iteration 3597: GROUND TRUTH : موي ٦ نوكي نم ٨ ةيزم١ رلا ٥ىتح ٩ جمارب ٣ زكرم ٧ نأ ٠ عقوملا ريغ Iteration 3597: BEST OCR TEXT : موي ٦ نوكج نم ٨ ةيزم١ رلا ٥وغح ٦ جمارب ٣ زكرم ٧ نأ ٠ عقوملا ريغ File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 13 : Mean rms=0.605%, delta=0.993%, train=2.794%(4.775%), skip ratio=0% Iteration 3598: GROUND TRUTH : نم غلبي ٢ نودجاوتملا ٣ ةدهاشم ١ ظفح ٤ تاكراشملا ٠ ةطساوب Iteration 3598: BEST OCR TEXT : ني علبيب ٣ نوضجاوتملا ٣ ةداضشم ١ ثفنح ٤ تاكراشملا ٠ ةطساوب File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 14 : Mean rms=0.608%, delta=1%, train=2.819%(4.825%), skip ratio=0% Iteration 3599: GROUND TRUTH : يصخشلا ٨ دمحم ٥ ءاوح ١ جمارب هل ٦ ةروصلا و ٧ ماظن ٩ ماع ناكو Iteration 3599: BEST OCR TEXT : يصخشلا ٨ دمحم ٥ ءاوح ١ جمارب هل ١ ةروصلا و ٧ ماظنن ٩ ماع نقكر File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 15 : Mean rms=0.61%, delta=1.002%, train=2.831%(4.844%), skip ratio=0% At iteration 2182/3600/3600, Mean rms=0.61%, delta=1.002%, char train= 2.831%, word train=4.844%, skip ratio=0%, New worst char error = 2.831 wrote checkpoint. Finished! Error rate = 0.064 As you can see it still reads AEN as AEN not AWN
Am I doing something wrong? and what should i do? On Monday, October 14, 2019 at 11:05:01 AM UTC+3, shree wrote: > > Replace AEN in your box files with AWN and rerun training, using the > original tif files > > On Mon, Oct 14, 2019, 12:16 Mobeen Ali <[email protected] <javascript:>> > wrote: > >> Hello everyone! I'm stuck with a problem of creating a traineddata file >> that reads numerals in arabic and gives output in english numerals. >> >> - Input = AEN Arabic Eastern Numbers {ِ٠١٢٣٤٥٦٧٨٩} >> - Output = AWN Arabic Western Numbers {0123456789} >> >> I have created a traineddata file successfully with no issues and very >> good accuracy now but this traineddata file takes arabic numerals as input >> and gives arabic numerals as output. >> >> But what i want is it should take arabic numerals as input and give >> english numerals as output >> >> Please i need help if someone knows anything please help! >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/2edb580d-c16e-4b0a-a704-15929982a372%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/2edb580d-c16e-4b0a-a704-15929982a372%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/05177c34-d8eb-4d0d-9b21-d187f1d8d347%40googlegroups.com.

