Hi shree,

Thanks for your reply. Is there any option to use tesstrain.sh in tesseract 
4.0 to generate the traineddata and lstm files using the image and 
boxfiles? Or do I still have to go through the process as listed in the 
Tesseract 3.0 instructions? In which case, I would be able to generate the 
traineddata file (and the unicharset file, I think), but not the lstm file. 
How can I generate this lstm file? Is there a tool I can use?

Thanks again,
Dennis

On Friday, April 13, 2018 at 5:19:47 AM UTC-7, shree wrote:
>
> training Tesseract 4.0 from images is not officially .supported .   Different 
> people have had success in doing LSTM training with box/tiff pairs. but it 
> requires hacks/programming on their part to create 4.0.0 compatible box 
> files. 
>
> tesstrain.sh creates box/tiff files in the /tmp directory, these are used 
> to create the lstmf files for LSTMtraining. tesstrain.sh can create a 3.0x 
> compatible traineddata or 4.0.0 compatible starter traineddata depending on 
> options that are chosen. For 4.0.0 this starter traineddata alongwith the 
> lstmf files is used for LSTM training.
>
> The format of traineddata files for 3.0x and 4.0.0 is different.
>
> For different components of a traineddata file, See
>
>
> https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc
>
> For creating 4.0 compatible box files see
>
> https://github.com/tesseract-ocr/langdata/issues/83#issuecomment-375247341
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#training-tesseract-lstm-engine
>
> Please note that all these are unsupported options.
>
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Apr 13, 2018 at 12:09 PM, <denni...@berkeley.edu <javascript:>> 
> wrote:
>
>> Hi all,
>>
>> I read in a different post that training Tesseract 4.0 from images is not 
>> supported, is this true? I have been able to successfully train Tesseract 
>> 4.0 so far using font data. When using tesstrain.sh, the script creates a 
>> number of files, including an lstmf file alongside the usual trainedata 
>> file (and there are some others like unicharset). I was wondering if it is 
>> possible to use the traineddata generation from image and boxfile described 
>> in the Tesseract 3.0 training instructions to create these training files 
>> to train Tesseract 4.0. Tesseract 3.0 instructions already produce a 
>> traineddata file, how can I generate the lstmf file (and the others) if it 
>> is possible?
>>
>> Thank you,
>> Dennis
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/bc664de6-5386-45b3-ae4d-70ac5338938c%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/bc664de6-5386-45b3-ae4d-70ac5338938c%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/385272ec-6801-4efd-957a-1bb5bc47175e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to