[tesseract-ocr] Re: New language traineddata based on the existing one.

Albrecht Hilker Fri, 04 Jul 2014 11:48:06 -0700

I'm facing the same problem.
But I fear that merging traineddata files is not implemented.
You always have to train from scratch.



On this page:
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
you find the following sentence:
>> ....but note that there is no incremental training mode that allows you 
to add new training data to existing sets. 


The problem is probably that all characters in a traineddata file have an 
ID starting with 1,2,3,4, and that all FontInfo's also have an ID and the 
features too.
To merge two traineddata files you would have to renumber all these IDs and 
detect which character is defined twice in both files.
I suppose that this is very complicated.

But it would be a great feature to have a merge function!
Maybe you add it ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/04894f14-d67f-4088-91cb-5051b4fa887a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: New language traineddata based on the existing one.

Reply via email to