Thanks a lot! This is not possible with the tesstrain repository right? [email protected] schrieb am Donnerstag, 23. November 2023 um 10:28:26 UTC+1:
> If the original model lacks the ∠ symbol, fine tuning is not going to add > it for you. We have all went through that process. To introduce a new > character, removing the top layer and train from there is the most > effective approach. > > On Thursday, November 23, 2023 at 12:15:56 PM UTC+3 [email protected] > wrote: > >> If I need to train new characters that are not recognized by a default >> model, is fine tuning in this case the right approach? >> One of these characters ist the one for angularity: ∠ >> >> This symbols appear in technical drawings and should be recognised in >> those. E.g. for the scenario in the following picture tesseract should >> reconize this symbol. >> >> >> >> [image: angularity.png] >> >> Also here is one of the pngs I tried to train with: >> [image: angularity_0_r0.jpg] >> They all look pretty similar to this one. Things that change are the >> angle, the propotion and the thickness of the lines. All examples have this >> 64x64 pixel box around it. >> >> >> Is Fine Tuning for this scenario the right approach as I only find >> information for fine tuning for specific fonts. For fine tune also the >> "tesstrain" repository would not be needed as it is used for training from >> scratch, correct? >> [email protected] schrieb am Mittwoch, 22. November 2023 um 15:27:02 >> UTC+1: >> >>> From my limited experience, you need a lot more data than that to train >>> from scratch. If you can't make more than that data, you might first try to >>> fine tune:and then train by removing the top layer of the best model. >>> >>> On Wednesday, November 22, 2023 at 4:46:53 PM UTC+3 [email protected] >>> wrote: >>> >>>> As it is not properly possible to combine my traineddata from scratch >>>> with an existing one, I have decided to also train my traineddata model >>>> numbers. Therefore I wrote a script which synthetically generates >>>> groundtruth data with text2image. >>>> This script uses dozens of different fonts and creates numbers for the >>>> following formats. >>>> X.XXX >>>> X.XX >>>> X,XX >>>> X,XXX >>>> I generated 10,000 files to train the numbers. But unfortunately >>>> numbers get recognized pretty poorly with the best model. (most of times >>>> only "0."; "0" or "0," gets recognized) >>>> So I wanted to ask if It is not enough training (ground truth data) for >>>> proper recognition when I train several fonts. >>>> Thanks in advance for you help. >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/23835b33-025a-48ad-9037-3eef237393cfn%40googlegroups.com.

