Hey Lorenzo,
thanks a lot for your response. I've seen in the HOCR files of different
technical drawings that the Tesseract Text Segmentation has massive
problems recognizing zones with text, probably because of the varios lines
and complex constructions within the technical drawing. Even the
Hi Simon, yes, I think the instructions you can give to the segmentation
step are quite limited, mostly the PSM parameter and I suppose a few minor
ones. There is something about tables but I've never used it and yours
might be too small for this to work. Yes, you should be able to see what is
Yes in general I want to recognice this part "< 0,05 A" except that the <
ist actually ∠ the character for angularity.
The segmentation process of tesseract can't be edited right? So you mean I
would need to make an Tesseract independent program that localizes the
boxes crops them out and
@zdenop:
Yes, because the characters start to show up (get recognized) only after
you run a few thousands of iterations. For me, new characters start to get
recognized only after I run 5000 iterations. At that point, the base model
will be deteriorated terribly. It is now a common knowledge
Hi Simon,
if I understand correctly how tesseract works, it follows this steps:
- it segments the image into lines of text
- it then takes each individual line and slides a small window, 1px wide I
think, over it, from one end to the other. For each step the model outputs
a prediction. The model,
št 23. 11. 2023 o 10:28 Des Bw napísal(a):
> If the original model lacks the ∠ symbol, fine tuning is not going to add
> it for you.
Really???
Tesseract documentation
6 matches
Mail list logo