Actually, for English + Hindi, use `script/Devanagari.traineddata` for English + Bengali, try `eng+ben` or `script/Bengali`
Please check the language code for Russian. On Wed, Feb 20, 2019 at 11:02 AM Shree Devi Kumar <[email protected]> wrote: > Please share a couple of scanned pages for testing. > > You may be able to use existing traineddata files for English and Russian > with -l eng+rus or for English and Hindi with -l eng+hin > > For text with diacritics you can try -l script/Latin > > This will give you an idea of current state. You can plan training after > that. > > On Wed, 20 Feb 2019, 10:20 Alexander Gribanov <[email protected] > wrote: > >> Hello! >> >> Just found a tesseract and it seems a very great and powerful instrument, >> but as we say in Russia, equipment in the hands of the fool is a >> scrap-metal... >> >> So please, if somebody would be kind and help me to give advice >> step-by-step: >> 1. What to do >> 2. What to read/watch >> 3. Take a look on the result and give me a hint where to go next >> >> My subject actually is that I have a lot of scanned (and many not scanned >> yet) books in mixed languages, >> like English, Russian, Hindi, Bengali, sometimes kind of diacritic >> symbols, etc... >> Most of them, I have to idea, is there any fonts available, which were >> they printed with... >> >> But I'm ready to select on the image for the first time some letters, >> words, etc >> Then tell to the program, which letter from image means as unicode char >> (not sure how does it called correctly) >> So this way maybe possible to create missing fonts >> >> So as I understood, the training neural network is kinda spiral process: >> 1. We have an image >> 2. We tell to the network, which part of the image is a symbol and what >> that symbol is (character code). >> This becomes a training materials >> 3. Network based on the first small experience (let's say 1 page) tries >> to recognize 2-nd page >> 4. We verify and correct if needed. It becomes more training materials >> >> And so on, so steps 3-4 repeats until the whole book will not be >> recognized. >> Sometimes step 2 will be invoked for new characters or patters, etc.. >> >> So I think, this is should be enough to understand my level on the >> subject and my goal, >> so I request, please, if anybody would like to help me to establish the >> process >> to recognize many rare books to be able to search and navigate among >> tons of scriptures, which will be lost and burried by the time... >> >> Thank You all very much, >> best regards, Alexander >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWAg0AC-tK9yNAw6xdPxr4MFWLcQY16U3C-AbjQKySKCg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

