Hey Lorenzo,
thanks a lot for your response. I've seen in the HOCR files of different
technical drawings that the Tesseract Text Segmentation has massive
problems recognizing zones with text, probably because of the varios lines
and complex constructions within the technical drawing. Even the
Hi Simon, yes, I think the instructions you can give to the segmentation
step are quite limited, mostly the PSM parameter and I suppose a few minor
ones. There is something about tables but I've never used it and yours
might be too small for this to work. Yes, you should be able to see what is
Yes in general I want to recognice this part "< 0,05 A" except that the <
ist actually ∠ the character for angularity.
The segmentation process of tesseract can't be edited right? So you mean I
would need to make an Tesseract independent program that localizes the
boxes crops them out and
@zdenop:
Yes, because the characters start to show up (get recognized) only after
you run a few thousands of iterations. For me, new characters start to get
recognized only after I run 5000 iterations. At that point, the base model
will be deteriorated terribly. It is now a common knowledge
Hi Simon,
if I understand correctly how tesseract works, it follows this steps:
- it segments the image into lines of text
- it then takes each individual line and slides a small window, 1px wide I
think, over it, from one end to the other. For each step the model outputs
a prediction. The model,
št 23. 11. 2023 o 10:28 Des Bw napísal(a):
> If the original model lacks the ∠ symbol, fine tuning is not going to add
> it for you.
Really???
Tesseract documentation
If you are planning to train, you need to make sure that your images
contain all those variations: in thickness, angle etc. I don't know if
text2image can do that for you. You might need to do it manually; or use
some other tool.
On Thursday, November 23, 2023 at 12:39:21 PM UTC+3 Des Bw
Download the best model and try it. If it recognizes, that is great. You an
also look at the unicharset of the best model.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send
Thanks a lot!
This is not possible with the tesstrain repository right?
desal...@gmail.com schrieb am Donnerstag, 23. November 2023 um 10:28:26
UTC+1:
> If the original model lacks the ∠ symbol, fine tuning is not going to add
> it for you. We have all went through that process. To introduce a
If the original model lacks the ∠ symbol, fine tuning is not going to add
it for you. We have all went through that process. To introduce a new
character, removing the top layer and train from there is the most
effective approach.
On Thursday, November 23, 2023 at 12:15:56 PM UTC+3
If I need to train new characters that are not recognized by a default
model, is fine tuning in this case the right approach?
One of these characters ist the one for angularity: ∠
This symbols appear in technical drawings and should be recognised in
those. E.g. for the scenario in the
>From my limited experience, you need a lot more data than that to train
from scratch. If you can't make more than that data, you might first try to
fine tune:and then train by removing the top layer of the best model.
On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 smon...@gmail.com
12 matches
Mail list logo