Actually, for English + Hindi, use `script/Devanagari.traineddata`
for English + Bengali, try `eng+ben` or `script/Bengali`

Please check the language code for Russian.

On Wed, Feb 20, 2019 at 11:02 AM Shree Devi Kumar <[email protected]>
wrote:

> Please share a couple of scanned pages for testing.
>
> You may be able to use existing traineddata files for English and Russian
> with -l eng+rus or for English and Hindi with -l eng+hin
>
> For text with diacritics you can try -l script/Latin
>
> This will give you an idea of current state. You can plan training after
> that.
>
> On Wed, 20 Feb 2019, 10:20 Alexander Gribanov <[email protected]
> wrote:
>
>> Hello!
>>
>> Just found a tesseract and it seems a very great and powerful instrument,
>> but as we say in Russia, equipment in the hands of the fool is a
>> scrap-metal...
>>
>> So please, if somebody would be kind and help me to give advice
>> step-by-step:
>> 1. What to do
>> 2. What to read/watch
>> 3. Take a look on the result and give me a hint where to go next
>>
>> My subject actually is that I have a lot of scanned (and many not scanned
>> yet) books in mixed languages,
>> like English, Russian, Hindi, Bengali, sometimes kind of diacritic
>> symbols, etc...
>> Most of them, I have to idea, is there any fonts available, which were
>> they printed with...
>>
>> But I'm ready to select on the image for the first time some letters,
>> words, etc
>> Then tell to the program, which letter from image means as unicode char
>> (not sure how does it called correctly)
>> So this way maybe possible to create missing fonts
>>
>> So as I understood, the training neural network is kinda spiral process:
>> 1. We have an image
>> 2. We tell to the network, which part of the image is a symbol and what
>> that symbol is (character code).
>>     This becomes a training materials
>> 3. Network based on the first small experience (let's say 1 page) tries
>> to recognize 2-nd page
>> 4. We verify and correct if needed. It becomes more training materials
>>
>> And so on, so steps 3-4 repeats until the whole book will not be
>> recognized.
>> Sometimes step 2 will be invoked for new characters or patters, etc..
>>
>> So I think, this is should be enough to understand my level on the
>> subject and my goal,
>> so I request, please, if anybody would like to help me to establish the
>> process
>> to recognize many rare books to be able to search and navigate among
>> tons of scriptures, which will be lost and burried by the time...
>>
>> Thank You all very much,
>> best regards, Alexander
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWAg0AC-tK9yNAw6xdPxr4MFWLcQY16U3C-AbjQKySKCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to