[tesseract-ocr] Recognition of chemical formulas

Vadim Fedorov Mon, 17 Dec 2018 09:49:13 -0800

Hello everyone,

I need an advice. Would it make sense to train a separate model (datafile) 
exclusively for recognition of chemical formulas?
With the default model for English the following formula


[image: test5.png] <about:invalid#zClosurez>
is recognized as "CONH(CH*5*)3N(C*o*H*s*)*o*" by LSTM engine. So there are 
mistakes in subscripts. My intuition is that a model trained on chemical 
formulas only would be able to handle this better.
What do you think?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a5704736-173a-4e21-a532-26595d94589b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Recognition of chemical formulas

Reply via email to