For tesseract 3.05

random text will work, it is suggested to use combos similar to English
training text.

It is unlikely you will get answers to your questions from the developers.
You can search past issues/questions in forum and github.

3.05 training does not take long, run a few experiments for your 'language'
and test.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Apr 9, 2018 at 2:15 PM, Romil Mehla <meh...@gmail.com> wrote:

> Hi Shree Thanks for replying
>
> For tesseract *3.05.00*
>
> I had already checked that link there they mentioned
> *"Make sure there are a minimum number of samples of each character. 10 is
> good, but 5 is OK for rare characters.*
> *There should be more samples of the more frequent characters - at least
> 20.*
> *Don't make the mistake of grouping all the non-letters together. Make the
> text more realistic"*
>
> Does it holds for langdatat eng.training_text if yes  Then that means they
> are generating it randomly . How randomly generated training text can
> assure accuracy.
> Also they have mentioned each character should have minimum sample of 10 ,
> why so , where in code this criteria is used . I have checked code but
> could not find this criteria anywhere . Is it related to algorithm ? then
> which one adaptive of shape classifier or related to bounding box
> coordinates .
>
> Please clear my doubts and if required please pull Ray or someone from dev
> team as well as i have doubts regarding tesseract code as well.
> I could not post in tesseract-dev forum because doubts should be asked in
> tesseract =user list only
>
> Then how can i have tesseract developer answer my question. Please tell me
> the way
>
> Thanks again for your timely reply and help .
>
>
>
>
> On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar <shreesh...@gmail.com>
> wrote:
>
>> see  https://github.com/tesseract-ocr/tesseract/wiki/Trainin
>> g-Tesseract-3.03%E2%80%933.05
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla <meh...@gmail.com> wrote:
>>
>>> Thanks for your reply , i have read about tesseract 4.0 and Ray
>>> mentioned how he used so many files to train tesseract 4.0 but i dont want
>>> to use tesseract 4.0 , i wanted to know about tesseract 3.05.00 , from my
>>> understanding suppose for eng languaur . eng.training_text file is build
>>> from eng.wordlist  file mentioned in langdata. For a new language how can i
>>> build training text from my new languaue wordlist ,any idea on who has
>>> created the eng.training_text  file ? is there any rule or algorithm to do
>>> so , or it is randomly generated from eng.wordlist by maintaining minimum
>>> 10 times occurrence of a character in training text.
>>>
>>>
>>>
>>> Please clarify on this , please let me know how to generate
>>> traning_text??
>>>
>>> On Saturday, April 7, 2018 at 3:46:10 PM UTC+5:30, shree wrote:
>>>>
>>>> Just a word list is not enough for training text.
>>>>
>>>> For tesseract 4.0.0 it needs to be representative of the text to be
>>>> recognized.
>>>>
>>>> On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, <meh...@gmail.com> wrote:
>>>>
>>>>> Is there any program to generate it ?  i see ambiguous_words.cpp
>>>>> generating dictionary words and ambiguous words where is it used ? or it
>>>>> can be used to build unicharambigs file to generate rules ?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b75
>>>>> 0-4be9-a1a0-01f832f679df%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2ce880b4-b750-4be9-a1a0-01f832f679df%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/fcfdc967-121e-480a-a0fe-e57f341115c7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_
>> cGM2Uipwp%2B5w%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcHvQfqitW37fh-tVk9GsfZq9Byc%3Dmv_cGM2Uipwp%2B5w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%
> 2ByLyqA%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAKLV5Psfa-y_ZXE-%2BJf%2BUVtPbicCdzkfVB6cHBfEnw8j%2ByLyqA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVMZ-ZtS1jO2VaaPCjLwOaW7DLmYrKFxHYAkve8QK3G_A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to