Please take a look at related issue regarding subscripts/superscripts (in
langdata or tessdata repos).

As far as I understand, the currently used normalization routines convert
them to  regular numbers. Hence, training did not seem to help in my fine
tuning trial.

However, you can give it a try and share your results.


On Mon, 17 Dec 2018, 12:48 Vadim Fedorov <coder...@gmail.com wrote:

> Hello everyone,
>
> I need an advice. Would it make sense to train a separate model (datafile)
> exclusively for recognition of chemical formulas?
> With the default model for English the following formula
>
> [image: test5.png]
> is recognized as "CONH(CH*5*)3N(C*o*H*s*)*o*" by LSTM engine. So there
> are mistakes in subscripts. My intuition is that a model trained on
> chemical formulas only would be able to handle this better.
> What do you think?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a5704736-173a-4e21-a532-26595d94589b%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a5704736-173a-4e21-a532-26595d94589b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVdX5o-Qt6iCUjktiQezFQAfEa6ufdzOkHc4PD-C49%2BXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to