I'm facing the same problem. But I fear that merging traineddata files is not implemented. You always have to train from scratch.
On this page: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 you find the following sentence: >> ....but note that there is no incremental training mode that allows you to add new training data to existing sets. The problem is probably that all characters in a traineddata file have an ID starting with 1,2,3,4, and that all FontInfo's also have an ID and the features too. To merge two traineddata files you would have to renumber all these IDs and detect which character is defined twice in both files. I suppose that this is very complicated. But it would be a great feature to have a merge function! Maybe you add it ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/04894f14-d67f-4088-91cb-5051b4fa887a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

