Hello everyone, (sorry for the bad english, isn't my first language) I am using the Tesseract to make car plate recognition (OCR) in my final paper. I trained the database with (at least) 20 samples of each character, manually extracted from actual plate images, without preprocessing. However, the results I have are not good enough.
Real-world images can be noisy and blurred, but tesseract can not learn the patterns in training? Due to the complexity of the backgrounds (ilumination variance, shadows, ...), even with preprocessing it is not possible to remove all the noise in all cases, some dirty are still present in the image. So, I decided train with samples withou preprocessing. I do wrong? The minimum proposed character height is 10 pixels. Is small? Is the number of samples not enough? I send a complete license plate image (all characters) to tesseract. Would I have a better result if I sent character by character? I've read that tesseract has segmentation algorithms internally, which are probably better than one implemented by me. Please, is there any suggestion to improve it? Any article recommendation that can help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c5d535d5-7ec2-4799-979f-04571ef42391%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

