Please share a couple of scanned pages for testing.

You may be able to use existing traineddata files for English and Russian
with -l eng+rus or for English and Hindi with -l eng+hin

For text with diacritics you can try -l script/Latin

This will give you an idea of current state. You can plan training after
that.

On Wed, 20 Feb 2019, 10:20 Alexander Gribanov <[email protected] wrote:

> Hello!
>
> Just found a tesseract and it seems a very great and powerful instrument,
> but as we say in Russia, equipment in the hands of the fool is a
> scrap-metal...
>
> So please, if somebody would be kind and help me to give advice
> step-by-step:
> 1. What to do
> 2. What to read/watch
> 3. Take a look on the result and give me a hint where to go next
>
> My subject actually is that I have a lot of scanned (and many not scanned
> yet) books in mixed languages,
> like English, Russian, Hindi, Bengali, sometimes kind of diacritic
> symbols, etc...
> Most of them, I have to idea, is there any fonts available, which were
> they printed with...
>
> But I'm ready to select on the image for the first time some letters,
> words, etc
> Then tell to the program, which letter from image means as unicode char
> (not sure how does it called correctly)
> So this way maybe possible to create missing fonts
>
> So as I understood, the training neural network is kinda spiral process:
> 1. We have an image
> 2. We tell to the network, which part of the image is a symbol and what
> that symbol is (character code).
>     This becomes a training materials
> 3. Network based on the first small experience (let's say 1 page) tries to
> recognize 2-nd page
> 4. We verify and correct if needed. It becomes more training materials
>
> And so on, so steps 3-4 repeats until the whole book will not be
> recognized.
> Sometimes step 2 will be invoked for new characters or patters, etc..
>
> So I think, this is should be enough to understand my level on the subject
> and my goal,
> so I request, please, if anybody would like to help me to establish the
> process
> to recognize many rare books to be able to search and navigate among
> tons of scriptures, which will be lost and burried by the time...
>
> Thank You all very much,
> best regards, Alexander
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWesTyi9j94c%2BSjEB3ssS83VMp9oFame%3DP7Kp_930s-ZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to