[tesseract-ocr] Training the LSTM language model explicitly in an unsupervised manner

Rahul Tyagi Mon, 15 Oct 2018 00:09:01 -0700

Hi,

I am trying to run tesseract-ocr on invoices to detect user ID's, Invoice 
numbers, tax codes etc. I think tesseract has not been trained on this kind 
of data so i need to fine tune the network on my data. Now it will be a bit 
difficult for me to get labelled data to fine tune tesseract as stated in 
training-tesseract wiki page. So wanted to know if its possible to only 
tune the language model of tesseract-ocr in an unsupervised way just like 
the language models trained for English Language Understanding i.e. showing 
the language model just the pins and ids by passing the output generated at 
previous (t-1) timestep as input to current timestep (t).


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/276e6dcf-f0b5-43e0-a794-d1bb69c88857%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Training the LSTM language model explicitly in an unsupervised manner

Reply via email to