Hi,

I was trying to train tesseract 4 to recognize trademark symbol ™. I was 
following examples on wiki:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replacing-Top-Layer-Example
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Replace-Top-Layer

I use German language for testing. With the traineddata from repository the 
trademark symbol is usually recognized as '" or some other variation of 
quotes. So I created training text that includes trademark symbol and 
started the training process. I replaced only the top layer as it is in the 
example, however the trademark symbol is still not recognized properly. 
With the newly generated traineddata the symbol is recognized as TM. I have 
several questions. 

1. Is it needed to replace more layers?
2. How large should be the training text? (mine is based on the one that is 
in langdata/deu directory)
3. I noticed that there are symbols © and ®. Why is trademark symbol 
missing?

Any other hints would be appreciated.

Thank you for your time,
Martin

P.S. Also thanks for the great work on the tesseract OCR. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8c1c548b-3c39-4622-99be-0bfbe5f486cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to