I just realised some of the output underneath "Trying word using lang fo, oem 0" might be useful information! here it is: Running NoDangerousAmbig() for 5 [35 ]0 3 [33 ]0 . [2e ]p Looking for replaceable ngrams starting with 5 [35 ]0: Looking for replaceable ngrams starting with 3 [33 ]0: Looking for replaceable ngrams starting with . [2e ]p: Looking for ambiguous ngrams starting with 5 [35 ]0: Looking for ambiguous ngrams starting with 3 [33 ]0: Looking for ambiguous ngrams starting with . [2e ]p: 53. ViterbiStateEntry(NEW) with ratings_sum=43.4269 length=3 cost=54.283619 top_choice_flags=0x19 XH_GOOD New Best Word Choice : 53. : R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 . state: 1 1 1 C -5.085 -3.497 -1.978
Stopper: 53. (word=n, case=y, xht_ok=NORMAL=[0,256]) Running NoDangerousAmbig() for 5 [35 ]0 3 [33 ]0 n [6e ]a Looking for replaceable ngrams starting with 5 [35 ]0: Looking for replaceable ngrams starting with 3 [33 ]0: Looking for replaceable ngrams starting with n [6e ]a: candidate ngram: n ( 20 ) current ngram from spec: n p C ( 20 6 24 ) comparison result: -1 Looking for ambiguous ngrams starting with 5 [35 ]0: Looking for ambiguous ngrams starting with 3 [33 ]0: Looking for ambiguous ngrams starting with n [6e ]a: candidate ngram: n ( 20 ) current ngram from spec: n ( 20 ) comparison result: 0 fixpt+=(2 3 0 1 ri) found ambiguity: ri ( 85 ) candidate ngram: n ( 20 ) current ngram from spec: n ( 20 ) comparison result: 0 fixpt+=(2 3 0 1 tr) found ambiguity: tr ( 114 ) candidate ngram: n ( 20 ) current ngram from spec: n ( 20 ) comparison result: 0 fixpt+=(2 3 0 1 ij) found ambiguity: ij ( 116 ) candidate ngram: n ( 20 ) current ngram from spec: n i ( 20 16 ) comparison result: -1 Resulting ambig_blob_choices: r0.00 c0.00 x[0,1]: 3 5 [35 ]0 r0.00 c0.00 x[0,1]: 27 3 [33 ]0 r0.00 c0.00 x[0,1]: 20 n [6e ]a r-1.00 c0.00 x[0,1]: 85 ri [72 69 ] r-1.00 c0.00 x[0,1]: 114 tr [74 72 ] r-1.00 c0.00 x[0,1]: 116 ij [69 6a ] 53n ViterbiStateEntry(NEW) with ratings_sum=43.4676 length=3 cost=67.374825 top_choice_flags=0x2 inconsistent=(punc 0 case 0 chartype 1 script 0 font 0) XH_GOOD New Secondary Word Choice : 53n : R=67.3748, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 n state: 1 1 1 C -5.085 -3.497 -2.159 Running NoDangerousAmbig() for 5 [35 ]0 3 [33 ]0 H [48 ]A Looking for replaceable ngrams starting with 5 [35 ]0: Looking for replaceable ngrams starting with 3 [33 ]0: Looking for replaceable ngrams starting with H [48 ]A: candidate ngram: H ( 51 ) current ngram from spec: H p p ( 51 6 6 ) comparison result: -1 Looking for ambiguous ngrams starting with 5 [35 ]0: Looking for ambiguous ngrams starting with 3 [33 ]0: Looking for ambiguous ngrams starting with H [48 ]A: 53H ViterbiStateEntry(NEW) with ratings_sum=43.4944 length=3 cost=67.416374 top_choice_flags=0x4 inconsistent=(punc 0 case 0 chartype 1 script 0 font 0) XH_GOOD New Secondary Word Choice : 53H : R=67.4164, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 H state: 1 1 1 C -5.085 -3.497 -2.279 Filtering against best choice : 53. : R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 . state: 1 1 1 C -5.085 -3.497 -1.978 Best Raw Choice : 53. : R=43.4269, C=-5.08463, F=1, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 . state: 1 1 1 C -5.085 -3.497 -1.978 Cooked Choice #0 : 53. : R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 . state: 1 1 1 C -5.085 -3.497 -1.978 Cooked Choice #1 : 53n : R=67.3748, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 n state: 1 1 1 C -5.085 -3.497 -2.159 Cooked Choice #2 : 53H : R=67.4164, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 H state: 1 1 1 C -5.085 -3.497 -2.279 Rejecter: 5 [35 ]0 3 [33 ]0 . [2e ]p (word=n, case=y, unambig=y, multiple=y) Best choice: accepted=0, adaptable=0, done=0 : Lang result : 53. : R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0 pos NORM NORM NORM str 5 3 . state: 1 1 1 C -5.085 -3.497 -1.978 On Friday, 10 August 2018 11:31:28 UTC+1, [email protected] wrote: > > Hi Shree, thanks for your patience and help! > > I have managed to produce the tesseract.log file with your help. Now i'm > trying to understand it a bit more. here is a quick snippet of the output i > want to show you: > *Rejecter: 5 [35 ]0 3 [33 ]0 . [2e ]p (word=n, case=y, unambig=y, > multiple=y)* > *Best choice: accepted=0, adaptable=0, done=0 : Lang result : 53. : > R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0* > *pos NORM NORM NORM* > *str 5 3 .* > *state: 1 1 1 * > *C -5.085 -3.497 -1.978* > *1 new words worse than 1 old words: r: 54.2836 v 1.81739 c: -5.08463 v > -3.90478 valid dict: 0 v 0* > *Already done word with lang eng at:Bounding box=(499,2)->(514,1361)* > *Processing word with lang eng at:Bounding box=(672,1253)->(762,1288)* > *Trying word using lang eng, oem 1* > *Best choice: accepted=1, adaptable=0, done=1 : Lang result : Date : > R=2.05422, C=-0.662761, F=1, Perm=8, xht=[0,3.40282e+038], ambig=0* > *pos NORM NORM NORM NORM* > *str D a t e* > *state: 1 1 1 1 * > *C -0.085 -0.095 -0.088 -0.085* > *1 new words better than 0 old words: r: 2.05422 v 0 c: -0.662761 v 0 > valid dict: 1 v 0* > *Processing word with lang eng at:Bounding box=(521,1084)->(842,1156)* > *Trying word using lang eng, oem 1* > *Best choice: accepted=1, adaptable=0, done=1 : Lang result : May : > R=1.64554, C=-0.733805, F=1, Perm=8, xht=[0,3.40282e+038], ambig=0* > *pos NORM NORM NORM* > *str M a y* > *state: 1 1 1 * > *C -0.092 -0.085 -0.105* > *Best choice: accepted=0, adaptable=0, done=1 : Lang result : 182.2. : > R=4.51301, C=-4.37332, F=1, Perm=6, xht=[0,3.40282e+038], ambig=0* > *pos NORM NORM NORM NORM NORM NORM* > *str 1 8 2 . 2 .* > *state: 1 1 1 1 1 1 * > *C -0.116 -0.204 -0.176 -0.612 -0.210 -0.625* > *1 new words better than 0 old words: r: 1.64554 v 0 c: -0.733805 v 0 > valid dict: 1 v 0* > *1 new words better than 0 old words: r: 4.51301 v 0 c: -4.37332 v 0 valid > dict: 0 v 0* > *Trying word using lang fo, oem 0* > > As you can see on the very last line, it says "Trying word using lang fo," > I can see this line being repeated about 5 times so it seems that sometimes > it does use the fo dictionary. However i wonder how it works. How does it > know when to use fo after looking at eng? does it only look at fo when it > sees a box coordinate for a letter/word but it's unable to find letters to > assign it and so it uses the next dictionary? If so, how can it be that > when entering "fo+eng" in the command instead of "eng+fo" make no > difference to the priority of the dictionary being assigned first for > search? > > On Thursday, 9 August 2018 11:29:11 UTC+1, shree wrote: >> >> output tesseract.log file should be produced in the directory from where >> you are running the command, usually where your OCR output is created. >> >> On Thu, Aug 9, 2018 at 3:48 PM <[email protected]> wrote: >> >>> Hello Shree, thank you for your prompt reply. >>> >>> I have now changed the logfile as instructed. Where can i find the >>> output tesseract.log file? will it be produced in the same location as the >>> logfile? in C:\Program Files (x86)\Tesseract-OCR\tessdata\configs ? I'm >>> guessing the tesseract.log file will be produced once i've used logfile in >>> the commands. >>> >>> Kind Regards, >>> >>> Damon >>> >>> >>> On Wednesday, 8 August 2018 19:07:02 UTC+1, shree wrote: >>>> >>>> i think this could be if your new traineddats is not trained to as high >>>> a accuracy level as the eng traineddata. >>>> >>>> You can setup a debug log to verify this. see >>>> https://github.com/tesseract-ocr/tesseract/issues/1275#issuecomment-360367865 >>>> >>>> for details >>>> >>>> On Wed, Aug 8, 2018 at 6:04 PM <[email protected]> wrote: >>>> >>>>> i'm trying to use the combination of two traineddata dictionaries >>>>> together due to one of them being able to recognise specific numbers >>>>> better >>>>> than the other. >>>>> >>>>> Here is an example of the code line. >>>>> >>>>> $codeLine .= '<br>magick convert "'.$filePath.'" >>>>> -quality 90 -density 300x300 -units PixelsPerInch "'.$output.'.jpg"'; // >>>>> $codeLine .= '<br>tesseract "'.$output.'.jpg" >>>>> "'.$output.'" -l fo+eng txt pdf'; >>>>> >>>>> Despite the fact i put "fo" in front (this is the one that recognises >>>>> the number 4 better), it still gives me an output text file that is >>>>> exactly >>>>> identical to the "eng" dictionary output when i run that solo on it's >>>>> own. >>>>> >>>>> For some reason, it chooses to not just prioritise eng but also >>>>> completely ignoring the fo traineddata file completely. >>>>> >>>>> The "fo" file definitely works as i've tested it solo. >>>>> >>>>> I have attached an image example of the text i'd like to OCR and the >>>>> two relevant traineddata files. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/1a5a6768-baeb-4ba9-9cbd-adda6cba957c%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a5a6768-baeb-4ba9-9cbd-adda6cba957c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/befd629e-e433-45dd-bf1a-7a5c955e9a61%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/befd629e-e433-45dd-bf1a-7a5c955e9a61%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/4d586bd6-83a4-4ff7-896e-6a429b82306f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

