Lstm training is not like legacy training. Please read the wiki pages
regarding 4.0 training. I have given all sample commands there. There are 3
different ways of training.

Read the bash scripts regarding training to know more.

tesstrain.sh with --linedata-only creates the box tiff pairs but only the
lstmf file is saved in output dir.

Without --linedata-only you will get 3.0 traineddata.

There are multiple steps to be done using the lstmf files to create the
final 4.0 traineddata.

Since you want to write a tutorial, please do your own reading and trials
first


- excuse the brevity, sent from mobile

On 12-Apr-2017 4:08 PM, <srns...@gmail.com> wrote:

> Sorry, I have given wrong commands for arabic. Actually i was referring to
> english.
>
> tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train
> unicharset_extractor eng.arial.exp4.box
> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
> about the font
> mftraining -F font_properties -U unicharset -O eng.unicharset eng.arial.
> exp4.tr
> shapeclustering -F unicharset eng.arial.exp4.tr
> cntraining eng.arial.exp4.tr
>
> mv inttemp eng.inttemp
> mv normproto eng.normproto
> mv pffmtable eng.pffmtable
> mv shapetable eng.shapetable
> combine_tessdata eng.
>
>
>  I request you to suggest the changes for the below commands with respect
> to tesseract 4.0 , these commands are for tess 3.0.
> Please suggest changes for the above steps. I plan to publish a rigorous
> explanative tutorial after getting overview of all the steps.
> Thank you.
>
>
>
>
>
>
> On Wednesday, April 12, 2017 at 4:04:42 PM UTC+5:30, shree wrote:
>>
>> Arabic was never trained with the legacy tesseract engine and I doubt you
>> will get any improvement over existing traineddata using cube or lstm.
>>
>> You are free to experiment and see what you come up with.
>>
>> I have pointed to the bash scripts for training. Please refer to them for
>> the correct process.
>>
>> - excuse the brevity, sent from mobile
>>
>> On 12-Apr-2017 4:00 PM, <srn...@gmail.com> wrote:
>>
>>> Hello shree, Thank you for your valuable reply.. Are there any changes i
>>> need to follow for the steps below.. I request you to suggest the changes
>>> for the below commands, these are for tess 3.0
>>>
>>> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
>>> unicharset_extractor ara.arial.exp4.box
>>> echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
>>> about the font
>>> mftraining -F font_properties -U unicharset -O ara.unicharset ara.arial.
>>> exp4.tr
>>> shapeclustering -F unicharset ara.arial.exp4.tr
>>> cntraining ara.arial.exp4.tr
>>>
>>> mv inttemp ara.inttemp
>>> mv normproto ara.normproto
>>> mv pffmtable ara.pffmtable
>>> mv shapetable ara.shapetable
>>> combine_tessdata ara.
>>>
>>>
>>> Please suggest changes for the above steps. I plan to publish a rigorous
>>> explanative tutorial after getting overview of all the steps.
>>> Thank you.
>>>
>>>
>>> On Wednesday, April 12, 2017 at 3:38:11 PM UTC+5:30, shree wrote:
>>>>
>>>> see https://github.com/tesseract-ocr/tesseract/blob/master/
>>>> training/tesstrain.sh
>>>>
>>>>
>>>> if ((LINEDATA)); then
>>>>   phase_E_extract_features "lstm.train" 8 "lstmf"
>>>>   make__lstmdata
>>>> else
>>>>   phase_E_extract_features "box.train" 8 "tr"
>>>>   phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
>>>>   if [[ "${ENABLE_SHAPE_CLUSTERING}" == "y" ]]; then
>>>>       phase_S_cluster_shapes
>>>>   fi
>>>>   phase_M_cluster_microfeatures
>>>>   phase_B_generate_ambiguities
>>>>   make__traineddata
>>>> fi
>>>>
>>>> --------------------
>>>>
>>>> lstm.train is for LSTM training
>>>>
>>>> box.train is for 3.0 Tesseract legacy engine training
>>>>
>>>> Please note that current master code is for alpha testing for 4.0 LSTM
>>>> and will most probably drop support for legacy engine.
>>>>
>>>> If you want the legacy tesseract engine and train for it, please use
>>>> the 3.05 branch of the github repo.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/70a9d13b-a28b-4e6f-9c78-ec1c41361d96%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e4a2c775-6e31-4a48-9e37-f981f862d37f%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e4a2c775-6e31-4a48-9e37-f981f862d37f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXOm1xgt697X%2By87W-vyygXzLuL%2BwN2yL55Ud28qgYB3g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to