Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

ShreeDevi Kumar Thu, 01 Mar 2018 06:26:03 -0800

Tesseract 4.00 alpha has two OCR engines. One is the legacy tesseract
engine which was used in 3.0x and the other is neural net based LSTM engine
available in 4.00alpha - master branch in github.


the traineddata files in tesseract-ocr/tessdata have language models
compatible with both of these. If you were to unpack the traineddata files
with combine_tessdata -u, you will see that there are more components in
files from   tesseract-ocr/tessdata .

While most languages are supposed to have better accuracy with the newer
LSTM based engine and models, there are certain cases in which legacy
tesseract is better. Hence it is still being supported.

tessdata_best files are accurate and can be used as the base for further
finetune training. These are only for the LSTM based engine.

tessdata_fast files are accurate and faster in processing, so it is
recommended to use them for OCR.  These are only for the LSTM based engine.

The best way for you to compare these is to use a set of test images, OCR
them using the different traineddatas and compare their accuracy using OCR
evaluation software such as
https://sites.google.com/site/textdigitisation/ocrevaluation


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Mar 1, 2018 at 6:51 PM, 이경준 <[email protected]> wrote:

> Oh. I know ㅜㅜㅜ Thank u ㅜㅜㅜㅜ I was really impressd by U
>
> OK. Thank you very much
>
> Last question ... I can not understand .. trainned  data type
>
> Your saying means that in the tesseract 4.0 / tessdata_best is better than
> tessdata  // ㅜㅜㅜ
>
> what is the tessdata_fast  ㅜㅜㅜㅜㅜㅜ ???? Fast integer versions of trained
> models
>
> ㅜㅜ Sorry ㅜㅜㅜ ㅜplz help me ...
> ....ㅜㅜ
>
> 2018년 3월 1일 목요일 오후 10시 10분 18초 UTC+9, shree 님의 말:
>>
>> >  I would to make a  customized and trainned "New trainneddata"
>>
>> OK. But training from scratch takes a lot of time. I assume that you want
>> to finetune.
>>
>> Please note that the traineddata files in tessdata and tessdata_best and
>> tessdata_fast are NOT compatible. So, it depends on what version of
>> tesseract program you are using.
>>
>> I have already  sent you the bash script that you can modify for
>> training.
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, Mar 1, 2018 at 6:36 PM, ShreeDevi Kumar <[email protected]>
>> wrote:
>>
>>> > combine_tessdata -u kor.traineddata What is that meaning ? Could you
>>> explain for me ?
>>>
>>> That command will show and unpack the components of your traineddata
>>> file.
>>>
>>> eg. from tesdata_fast
>>>
>>> combine_tessdata -u ./tessdata_fast/kor.traineddata ./tessdata_fast/kor.
>>> Extracting tessdata components from ./tessdata_fast/kor.traineddata
>>> Wrote ./tessdata_fast/kor.config
>>> Wrote ./tessdata_fast/kor.lstm
>>> Wrote ./tessdata_fast/kor.lstm-punc-dawg
>>> Wrote ./tessdata_fast/kor.lstm-word-dawg
>>> Wrote ./tessdata_fast/kor.lstm-number-dawg
>>> Wrote ./tessdata_fast/kor.lstm-unicharset
>>> Wrote ./tessdata_fast/kor.lstm-recoder
>>> Wrote ./tessdata_fast/kor.version
>>> Version string:4.00.00alpha:kor:synth20170629:[1,48,0,1Ct3,3,16Mp3,3
>>> Lfys64Lfx96Lrx96Lfx384O1c1]
>>> 0:config:size=90, offset=192
>>> 17:lstm:size=973837, offset=282
>>> 18:lstm-punc-dawg:size=2602, offset=974119
>>> 19:lstm-word-dawg:size=605274, offset=976721
>>> 20:lstm-number-dawg:size=74, offset=1581995
>>> 21:lstm-unicharset:size=76228, offset=1582069
>>> 22:lstm-recoder:size=19034, offset=1658297
>>> 23:version:size=80, offset=1677331
>>>
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/ms
> gid/tesseract-ocr/633868d4-5943-46a5-b584-1a32a89131b7%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/633868d4-5943-46a5-b584-1a32a89131b7%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVJBwAGjnkTk01td-MhoT_hHzXSf5LogLWghQKYq5930g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

Reply via email to