Re: [tesseract-ocr] Re: train tesseract OCR 4.0

srnsp92 Wed, 05 Apr 2017 01:32:42 -0700

Overview of Training Process 

The overall training process is similar to training 3.04 
<https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract> 
Conceptually the same:


   1. Prepare training text.
   2. Render text to image + box file. (Or create hand-made box files for 
   existing image data.)
   3. Make unicharset file.
   4. Optionally make dictionary data.
   5. Run tesseract to process image + box file to make training data set.
   6. Run training on training data set.
   7. Combine data files.

The key differences are:

   - The boxes only need to be at the *textline level.* It is thus *far 
   easier* to make training data from existing image data.
   - The .tr files are replaced by .lstmf data files.
   - Fonts *can and should be mixed freely* instead of being separate.
   - The clustering steps (mftraining, cntraining, shapeclustering) are 
   replaced with a single slow lstmtraining step.



Hello shrreDevi,


I request u to guide me in eloborating the above marked steps, as i am not 
able to find the relevant steps for them.


The steps which I am following is giving me the above errors in previuos 
reply. 


Please guide me.





On Wednesday, April 5, 2017 at 9:07:40 AM UTC+5:30, shree wrote:
>
> Read
>
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example
>
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer
>
> and
>
> https://github.com/tesseract-ocr/tesseract/wiki/Documentation
>
> https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>
> https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
>
> https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
>
> https://github.com/tesseract-ocr/tesseract/wiki/FAQ
>
>
>
>
> ShreeDevi
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Apr 5, 2017 at 12:54 AM, <[email protected] <javascript:>> wrote:
>
>> Can you please post some experiences in this post, as there are no posts 
>> to train tesseract 4.
>>
>> 1)And also, is there any way to add the new trained data file to old 
>> trained data file, without replacing the old file.
>> 2)If we dont know what font we may get in our images, then how should we 
>> proceed in training the tessract 
>>
>> On Tuesday, April 4, 2017 at 9:27:06 PM UTC+5:30, Saurabh Srivastav wrote:
>>>
>>> Yes, i trained my tesseract for eng font and make them read the 
>>> characters from image.
>>>
>>>> thanks,
>>>>> Saurabh Srivastav
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/9c88494c-6d80-4b31-b247-dbbacd48bc19%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/34ce1784-970d-4b42-8cb6-846fe63c5393%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

Reply via email to