Please take a look at related issue regarding subscripts/superscripts (in langdata or tessdata repos).
As far as I understand, the currently used normalization routines convert them to regular numbers. Hence, training did not seem to help in my fine tuning trial. However, you can give it a try and share your results. On Mon, 17 Dec 2018, 12:48 Vadim Fedorov <coder...@gmail.com wrote: > Hello everyone, > > I need an advice. Would it make sense to train a separate model (datafile) > exclusively for recognition of chemical formulas? > With the default model for English the following formula > > [image: test5.png] > is recognized as "CONH(CH*5*)3N(C*o*H*s*)*o*" by LSTM engine. So there > are mistakes in subscripts. My intuition is that a model trained on > chemical formulas only would be able to handle this better. > What do you think? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/a5704736-173a-4e21-a532-26595d94589b%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/a5704736-173a-4e21-a532-26595d94589b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVdX5o-Qt6iCUjktiQezFQAfEa6ufdzOkHc4PD-C49%2BXA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.