Re: [tesseract-ocr] Input in Arabic Eastern Numbers and Output in Arabic Western Numbers

Mobeen Ali Sun, 01 Dec 2019 03:42:25 -0800

So, here's what i did,

   1. i ran text2image with my training_text file
   text2image --text /home/mobeen/customtrain/langdata/ara/ara.training_text 
   \
   --outputbase /home/mobeen/customtrain/tiff-box/ara.Arial \
   --fonts_dir /home/mobeen/Documents/fonts \
   --font 'Arial'
   By this, i got tiff and box files as output. I removed the box file 
   created by text2image as it is not in lstm format
   2. Then I ran 
   tesseract /home/mobeen/customtrain/tiff-box/ara.Arial.tif /home/mobeen/
   customtrain/tiff-box/ara.Arial -l ara-new lstmbox
   this gave me the lstm format box file.
   3. Next I opened this box file replaced all AEN with AWN and save the 
   file.
   4. Then i ran tesstrain using --my_boxtiff_dir argument, as follows: 
   src/training/tesstrain.sh \
   --fonts_dir /home/mobeen/Documents/fonts \
   --lang ara --linedata_only --noextract_font_properties \
   --langdata_dir ../langdata \
   --tessdata_dir ./tessdata \
   --output_dir ~/customtrain/aratrain \
   --fontlist 'Arial' \
   --my_boxtiff_dir /home/mobeen/customtrain/tiff-box
   this generated the lstmf file and gave me a starter traineddata file.
   5. Next i ran, 
   training/lstmtraining --debug_interval -1 \
   --traineddata ~/customtrain/aratrain/ara/ara.traineddata \
   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
   --model_output ~/customtrain/araoutput/base --learning_rate 20e-4 \
   --train_listfile ~/customtrain/aratrain/ara.training_files.txt \
   --eval_listfile ~/customtrain/araeval/ara.training_files.txt \
   --max_iterations 3600 &>~/customtrain/araoutput/basetrain.log
   In another tereminal window i ran, 
   tail -f ~/customtrain/araoutput/basetrain.log
   Wich displayed this: 
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 3 :
   Mean rms=0.585%, delta=0.957%, train=2.68%(4.53%), skip ratio=0%
   Iteration 3588: GROUND  TRUTH : يف نأ ةفاضإ ١ مالفا و امك خيرات ٢ 
   ةيسيئرلا ٣ مقر ٤ برعلا
   Iteration 3588: BEST OCR TEXT : يف نأ ةفشإ ١ مالا و امك خيراا ٢ ةيسيئرلا 
   ٣ مقر ٤ برملا
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 4 :
   Mean rms=0.588%, delta=0.963%, train=2.691%(4.558%), skip ratio=0%
   Iteration 3589: GROUND  TRUTH : ىدتنم ٨ نآلا دق ٥ مسق ٧ ةفاضإ _ ٦ عيقوتلا 
   ٩ ةيبرعلا ىدتنم
   Iteration 3589: BEST OCR TEXT : ىدتنم ٥ نآلا هق ٥ مسا ٧ ةفاضإ _ ٦ عيقوتلا 
   ٢ ةيبرعلا ىدتنم
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 5 :
   Mean rms=0.59%, delta=0.968%, train=2.705%(4.587%), skip ratio=0%
   Iteration 3590: GROUND  TRUTH : ةيزمرلا ٦ ىلإ ٩ جماربلا ٨ ذنم ٥ ١ ىدتنملا 
   ٧ نع ىدتنم
   Iteration 3590: BEST OCR TEXT : ةيزمرلا ١ ىلإ ٩ جماربلا ٨ انم ٥ ١ ىدتنسلا 
   ٧ نع ىدتنم
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 6 :
   Mean rms=0.592%, delta=0.971%, train=2.717%(4.61%), skip ratio=0%
   Iteration 3591: GROUND  TRUTH : هيف ٧ دمحأ ٩ ةيزمرلا ٣ دوك ٥ رورملا ١ حب 
   هل ٦ ةفاك ٨ ماعلا ٣ يلع
   Iteration 3591: BEST OCR TEXT : هيف ٧ دمحأ ٣ ةيزمرلا ٣ دوك ٥ رورملا ٠ نب 
   هل ٦ ةفا ٥ مسقا ٣ يلع
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 7 :
   Mean rms=0.594%, delta=0.976%, train=2.738%(4.643%), skip ratio=0%
   Iteration 3592: GROUND  TRUTH : ىلعو ٧ نب ٦ ةكراشملا ٥ خيرات ٨ عيطتست ٩ 
   ىلعألا
   Iteration 3592: BEST OCR TEXT : ىلاو ٧ نب ٩ ةكراشملا ٥ خيرقت ٨ عيقطتست ٩ 
   ىلعأل
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 8 :
   Mean rms=0.596%, delta=0.979%, train=2.751%(4.689%), skip ratio=0%
   Iteration 3593: GROUND  TRUTH : هيلع ٨ دئاصق ٦ لئاسرلا ٧ برغملا ٥ نيطسلف 
   ١ يه ٣ ماظنلا ٩ تاكراشم
   Iteration 3593: BEST OCR TEXT : هيلع ٨ دئاضق ٩ لئاسرلا ٧ برتملا ٥ نيطسلفا 
   ٢ يه ٣ ماظنلا ٩ تاكراشم
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 9 :
   Mean rms=0.599%, delta=0.984%, train=2.765%(4.722%), skip ratio=0%
   Iteration 3594: GROUND  TRUTH : / ٩ ةديدج ٦ يذلا نإ ال ٧ سلجم ٩ هب ٠ 
   ىلوألا ٥ روصلا ٨ لا راوزلا
   Iteration 3594: BEST OCR TEXT : / ٩ ةديدج ٦ يذلا نإ ال ٧ سدجم ٩ هب ٠ 
   ىلوألا ٨ روصلا ٨ لا راولا
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 10 :
   Mean rms=0.601%, delta=0.987%, train=2.773%(4.739%), skip ratio=0%
   Iteration 3595: GROUND  TRUTH : عيضاوم ٨ تاكراشم ٥ انب ٣ تانب ٧ رابخأ ٠ 
   ىلع ٦ ريغ اذه دقو لكشب ٩
   Iteration 3595: BEST OCR TEXT : عيضاوم ٨ تاكراشم ٥ انب ٣ تانب ٧ رايخأ ٠ 
   ىلع ٦ ريغ اذه دقو لكشب ٩
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 11 :
   Mean rms=0.602%, delta=0.988%, train=2.777%(4.744%), skip ratio=0%
   Iteration 3596: GROUND  TRUTH : خيشلا ٩ ثحبلا ٨ رييغت ٦ نيب ١ مسا ءزجلا ٧ 
   يف لالخ ٥ عوضوملا
   Iteration 3596: BEST OCR TEXT : خيللا ٩ ثحبلا ٨ ريغت ٦ نيب ١ مسا ءزجلا ٧ 
   يف لالخ ٥ عوضوملا
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 12 :
   Mean rms=0.603%, delta=0.99%, train=2.782%(4.758%), skip ratio=0%
   Iteration 3597: GROUND  TRUTH : موي ٦ نوكي نم ٨ ةيزم١ رلا ٥ىتح ٩ جمارب ٣ 
   زكرم ٧ نأ ٠ عقوملا ريغ
   Iteration 3597: BEST OCR TEXT : موي ٦ نوكج نم ٨ ةيزم١ رلا ٥وغح ٦ جمارب ٣ 
   زكرم ٧ نأ ٠ عقوملا ريغ
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 13 :
   Mean rms=0.605%, delta=0.993%, train=2.794%(4.775%), skip ratio=0%
   Iteration 3598: GROUND  TRUTH : نم غلبي ٢ نودجاوتملا ٣ ةدهاشم ١ ظفح ٤ 
   تاكراشملا ٠ ةطساوب
   Iteration 3598: BEST OCR TEXT : ني علبيب ٣ نوضجاوتملا ٣ ةداضشم ١ ثفنح ٤ 
   تاكراشملا ٠ ةطساوب
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 14 :
   Mean rms=0.608%, delta=1%, train=2.819%(4.825%), skip ratio=0%
   Iteration 3599: GROUND  TRUTH : يصخشلا ٨ دمحم ٥ ءاوح ١ جمارب هل ٦ ةروصلا 
   و ٧ ماظن ٩ ماع ناكو
   Iteration 3599: BEST OCR TEXT : يصخشلا ٨ دمحم ٥ ءاوح ١ جمارب هل ١ ةروصلا 
   و ٧ ماظنن ٩ ماع نقكر
   File /home/mobeen/customtrain/aratrain/ara.Arial.exp0.lstmf line 15 :
   Mean rms=0.61%, delta=1.002%, train=2.831%(4.844%), skip ratio=0%
   At iteration 2182/3600/3600, Mean rms=0.61%, delta=1.002%, char train=
   2.831%, word train=4.844%, skip ratio=0%,  New worst char error = 2.831 
   wrote checkpoint.
   
   Finished! Error rate = 0.064
   As you can see it still reads AEN as AEN not AWN


Am I doing something wrong? and what should i do?


On Monday, October 14, 2019 at 11:05:01 AM UTC+3, shree wrote:
>
> Replace AEN in your box files with AWN and rerun training, using the 
> original tif files
>
> On Mon, Oct 14, 2019, 12:16 Mobeen Ali <[email protected] <javascript:>> 
> wrote:
>
>> Hello everyone! I'm stuck with a problem of creating a traineddata file 
>> that reads numerals in arabic and gives output in english numerals.
>>
>>    - Input = AEN Arabic Eastern Numbers {ِ٠١٢٣٤٥٦٧٨٩}
>>    - Output = AWN Arabic Western Numbers {0123456789}
>>
>> I  have created a traineddata file successfully with no issues and very 
>> good accuracy now but this traineddata file takes arabic numerals as input 
>> and gives arabic numerals as output.
>>
>> But what i want is it should take arabic numerals as input and give 
>> english numerals as output
>>
>> Please i need help if someone knows anything please help!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2edb580d-c16e-4b0a-a704-15929982a372%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/2edb580d-c16e-4b0a-a704-15929982a372%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05177c34-d8eb-4d0d-9b21-d187f1d8d347%40googlegroups.com.

Re: [tesseract-ocr] Input in Arabic Eastern Numbers and Output in Arabic Western Numbers

Reply via email to