Re: [tesseract-ocr] Re: Training from Scratch

2023-11-29 Thread Simon
Hey Lorenzo, thanks a lot for your response. I've seen in the HOCR files of different technical drawings that the Tesseract Text Segmentation has massive problems recognizing zones with text, probably because of the varios lines and complex constructions within the technical drawing. Even the

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-27 Thread Lorenzo Bolzani
Hi Simon, yes, I think the instructions you can give to the segmentation step are quite limited, mostly the PSM parameter and I suppose a few minor ones. There is something about tables but I've never used it and yours might be too small for this to work. Yes, you should be able to see what is

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-25 Thread Simon
Yes in general I want to recognice this part "< 0,05 A" except that the < ist actually ∠ the character for angularity. The segmentation process of tesseract can't be edited right? So you mean I would need to make an Tesseract independent program that localizes the boxes crops them out and

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-24 Thread Des Bw
@zdenop: Yes, because the characters start to show up (get recognized) only after you run a few thousands of iterations. For me, new characters start to get recognized only after I run 5000 iterations. At that point, the base model will be deteriorated terribly. It is now a common knowledge

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-24 Thread Lorenzo Bolzani
Hi Simon, if I understand correctly how tesseract works, it follows this steps: - it segments the image into lines of text - it then takes each individual line and slides a small window, 1px wide I think, over it, from one end to the other. For each step the model outputs a prediction. The model,

Re: [tesseract-ocr] Re: Training from Scratch

2023-11-23 Thread Zdenko Podobny
št 23. 11. 2023 o 10:28 Des Bw napísal(a): > If the original model lacks the ∠ symbol, fine tuning is not going to add > it for you. Really??? Tesseract documentation