I am looking to understand the architecture of OCR pipeline in tesseract v5.0.1 to know about *the preprocessing that happen before the LSTM network during inference and training*.
I could only find these 7 year old documentation notes ( https://github.com/tesseract-ocr/docs/tree/main/das_tutorial2016) and I am not sure if they are still accurate. 1. Is the information I am looking for present anywhere in the online documentation (https://tesseract-ocr.github.io/tessdoc/)? 2. Is there a way to turn off the pagelayout analysis and other preprocessing before the LSTM modules? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3f329911-5d88-4ca5-9089-f66b78798been%40googlegroups.com.

