Ok it tried it again and have to correct myself. When I use "gdt+eng", "eng" seems to be the dominant traineddata, because no matter in what order I use the result is always the same as I only used "eng". "eng" on itself works fine. I downloaded "eng" traineddata from the git best repository. I am using Tesseract 4.1.1 so my generated traineddata "gdt" should align with the traineddata of the github tessdata_best.
[email protected] schrieb am Montag, 20. November 2023 um 10:27:23 UTC+1: > Going out on a limb here, but does '-l eng' on its own deliver any text > for you? > > The next thing I would look into, if I were you, is whether my 'eng' > traineddata has the same (lstm aka v4, I suppose) support listed as your > gdt traineddata. I've seen it happens where those do not align. > > There's a tesseract tool to list the traineddata engine features (forgot > the name/cli Argos, sorry) and one to merge traineddata files > (combine_something, but I have to look it up, so you'll be as fast as me > with Google + doc search), but my *hunch* is that you wont need the combine > tool; what I've seen so far is tesseract picks an engine (psm setting > drives this, IIRC) and then pumps the image through all loaded languages on > a segment by segment basis. (IIRC, so YMMV ;-) ) > > (The bit I'm wondering about now myself is: there was some sort of > criterium in there, in the code, when to decide to try? or use? multiple > lang results; it just /might/ be that's causing trouble, but I would have > to dig deep into the code for that and it doesn't rate above "wild crazy > guess" anyway, so better take the same route and check your installed 'eng' > database is doing what it's supposed to, on its own, first. > > The next sane thing to try is flipping them around, ie "eng+gdt" instead > of "gdt+eng", to see if results change and /how/, as that might give us all > a hint about what's going on in there. > > > > > > On Mon, 20 Nov 2023, 09:23 Simon, <[email protected]> wrote: > >> Hello everybody, >> >> right now I am working with tesseract to train it new symbols. Therefore >> I used tif pictures with only the desired symbol in it. I trained with >> tesstrain Repository and about 4000 training images. At the end of the >> procedure I got the traineddata file for my model Common_gdt. >> Except of the symbol(s) I trained in the model Common_gdt also numbers >> should be recognized. Obviously if I only use Common_gdt Tesseract only >> recognizes the symbols trained for but no numbers. >> To solve this problem I used -l Common_gdt+eng which should use both >> traineddata files. But when I use these files like this, It is like "eng" >> doesn't do anything. The results are the same, as I used only Common_gdt. >> >> Does anyone have an idea how traineddata files can be combined? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/9ee1df96-eef7-4f93-b93a-2c7914ab52c9n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/9ee1df96-eef7-4f93-b93a-2c7914ab52c9n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6324e072-4aba-4b41-a06f-a6ba1e4b2018n%40googlegroups.com.

