The interesting part is: TextRecognitionDataGenerator does also generate tesseract compatible box files. But, I find no easy way to produce training files (such as lstm, .tif and the like ones) from the images and the box files made by TextRecognitionDataGenerator. I am pretty sure a little experienced users already know how to do that. On Wednesday, November 8, 2023 at 8:51:51 AM UTC+3 Des Bw wrote:
> text2image is a great script shipped with Tesseract. It is used to > generate synthetic data to produce images from text files. It has a few > control parameters to make the generated images similar to scanned images. > > But, I have lately learned that the images generated by text2image are > nowhere realistic as the ones generated by > https://github.com/Belval/TextRecognitionDataGenerator. The latter tool > has more powerful controls to produce the exact type of image you want to > generate. > > > - has anyway found a way of making tesseract work with other text > generation tools such as TextRecognitionDataGenerator? > - if so, what is the experience? > - and for the developers, is there anyways to replace text2image > with TextRecognitionDataGenerator? > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/68ec6c2d-560b-4c5a-86e9-7559571de584n%40googlegroups.com.

