Thanks Shree, appreciate your support Regards
On Tuesday, September 4, 2018 at 7:25:33 PM UTC+1, shree wrote: > > My earlier suggestion of mixing the two kinds of images - scanned pages > and text2image created synthetic ones - was from before ocrd-train was > available. > > ocrd-train works on single line images, while tesstrain.sh works on > multipage tifs. By mixing these the single line images will get more > iterations during training. > > - pass_through_recoder is needed for complex scripts such as Indic > scripts and may not be needed for Latin script based langauges. > > For finetuning the number of iterations should be very low, about 300-400 > for a new font and 3000-4000 for adding a new character. More iterations > will lead to overfitting as you are seeing. > > Please experiment with different options to see what works best for your > language and testsets. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2a377312-6bcc-489a-b5ce-f1c6e710d858%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

