Sorry, I have given wrong commands for arabic. Actually i was referring to 
english. 

tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train
unicharset_extractor eng.arial.exp4.box
echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations 
about the font
mftraining -F font_properties -U unicharset -O eng.unicharset eng.arial.exp4
.tr
shapeclustering -F unicharset eng.arial.exp4.tr
cntraining eng.arial.exp4.tr

mv inttemp eng.inttemp
mv normproto eng.normproto
mv pffmtable eng.pffmtable
mv shapetable eng.shapetable
combine_tessdata eng.


 I request you to suggest the changes for the below commands with respect 
to tesseract 4.0 , these commands are for tess 3.0. 
Please suggest changes for the above steps. I plan to publish a rigorous 
explanative tutorial after getting overview of all the steps.
Thank you.






On Wednesday, April 12, 2017 at 4:04:42 PM UTC+5:30, shree wrote:
>
> Arabic was never trained with the legacy tesseract engine and I doubt you 
> will get any improvement over existing traineddata using cube or lstm.
>
> You are free to experiment and see what you come up with.
>
> I have pointed to the bash scripts for training. Please refer to them for 
> the correct process.
>
> - excuse the brevity, sent from mobile
>
> On 12-Apr-2017 4:00 PM, <[email protected] <javascript:>> wrote:
>
>> Hello shree, Thank you for your valuable reply.. Are there any changes i 
>> need to follow for the steps below.. I request you to suggest the changes 
>> for the below commands, these are for tess 3.0
>>
>> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
>> unicharset_extractor ara.arial.exp4.box
>> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations 
>> about the font
>> mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.
>> exp4.tr
>> shapeclustering -F unicharset ara.arial.exp4.tr
>> cntraining ara.arial.exp4.tr
>>
>> mv inttemp ara.inttemp
>> mv normproto ara.normproto
>> mv pffmtable ara.pffmtable
>> mv shapetable ara.shapetable
>> combine_tessdata ara.
>>
>>
>> Please suggest changes for the above steps. I plan to publish a rigorous 
>> explanative tutorial after getting overview of all the steps.
>> Thank you.
>>
>>
>> On Wednesday, April 12, 2017 at 3:38:11 PM UTC+5:30, shree wrote:
>>>
>>> see 
>>> https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh
>>>
>>>
>>> if ((LINEDATA)); then
>>>   phase_E_extract_features "lstm.train" 8 "lstmf"
>>>   make__lstmdata
>>> else
>>>   phase_E_extract_features "box.train" 8 "tr"
>>>   phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
>>>   if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
>>>       phase_S_cluster_shapes
>>>   fi
>>>   phase_M_cluster_microfeatures
>>>   phase_B_generate_ambiguities
>>>   make__traineddata
>>> fi
>>>
>>> --------------------
>>>
>>> lstm.train is for LSTM training
>>>
>>> box.train is for 3.0 Tesseract legacy engine training
>>>
>>> Please note that current master code is for alpha testing for 4.0 LSTM 
>>> and will most probably drop support for legacy engine.
>>>
>>> If you want the legacy tesseract engine and train for it, please use the 
>>> 3.05 branch of the github repo.
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e4a2c775-6e31-4a48-9e37-f981f862d37f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to