Re: [tesseract-ocr] Re: Training from Scratch

2023-11-24 Thread Lorenzo Bolzani
Hi Simon, if I understand correctly how tesseract works, it follows this steps: - it segments the image into lines of text - it then takes each individual line and slides a small window, 1px wide I think, over it, from one end to the other. For each step the model outputs a prediction. The model,

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-24 Thread Des Bw
@zdenop: Yes, because the characters start to show up (get recognized) only after you run a few thousands of iterations. For me, new characters start to get recognized only after I run 5000 iterations. At that point, the base model will be deteriorated terribly. It is now a common knowledge