Hi Shree, thanks for your patience and help! I have managed to produce the tesseract.log file with your help. Now i'm trying to understand it a bit more. here is a quick snippet of the output i want to show you: *Rejecter: 5 [35 ]0 3 [33 ]0 . [2e ]p (word=n, case=y, unambig=y, multiple=y)* *Best choice: accepted=0, adaptable=0, done=0 : Lang result : 53. : R=54.2836, C=-5.08463, F=1.5, Perm=2, xht=[0,256], ambig=0* *pos NORM NORM NORM* *str 5 3 .* *state: 1 1 1 * *C -5.085 -3.497 -1.978* *1 new words worse than 1 old words: r: 54.2836 v 1.81739 c: -5.08463 v -3.90478 valid dict: 0 v 0* *Already done word with lang eng at:Bounding box=(499,2)->(514,1361)* *Processing word with lang eng at:Bounding box=(672,1253)->(762,1288)* *Trying word using lang eng, oem 1* *Best choice: accepted=1, adaptable=0, done=1 : Lang result : Date : R=2.05422, C=-0.662761, F=1, Perm=8, xht=[0,3.40282e+038], ambig=0* *pos NORM NORM NORM NORM* *str D a t e* *state: 1 1 1 1 * *C -0.085 -0.095 -0.088 -0.085* *1 new words better than 0 old words: r: 2.05422 v 0 c: -0.662761 v 0 valid dict: 1 v 0* *Processing word with lang eng at:Bounding box=(521,1084)->(842,1156)* *Trying word using lang eng, oem 1* *Best choice: accepted=1, adaptable=0, done=1 : Lang result : May : R=1.64554, C=-0.733805, F=1, Perm=8, xht=[0,3.40282e+038], ambig=0* *pos NORM NORM NORM* *str M a y* *state: 1 1 1 * *C -0.092 -0.085 -0.105* *Best choice: accepted=0, adaptable=0, done=1 : Lang result : 182.2. : R=4.51301, C=-4.37332, F=1, Perm=6, xht=[0,3.40282e+038], ambig=0* *pos NORM NORM NORM NORM NORM NORM* *str 1 8 2 . 2 .* *state: 1 1 1 1 1 1 * *C -0.116 -0.204 -0.176 -0.612 -0.210 -0.625* *1 new words better than 0 old words: r: 1.64554 v 0 c: -0.733805 v 0 valid dict: 1 v 0* *1 new words better than 0 old words: r: 4.51301 v 0 c: -4.37332 v 0 valid dict: 0 v 0* *Trying word using lang fo, oem 0*
As you can see on the very last line, it says "Trying word using lang fo," I can see this line being repeated about 5 times so it seems that sometimes it does use the fo dictionary. However i wonder how it works. How does it know when to use fo after looking at eng? does it only look at fo when it sees a box coordinate for a letter/word but it's unable to find letters to assign it and so it uses the next dictionary? If so, how can it be that when entering "fo+eng" in the command instead of "eng+fo" make no difference to the priority of the dictionary being assigned first for search? On Thursday, 9 August 2018 11:29:11 UTC+1, shree wrote: > > output tesseract.log file should be produced in the directory from where > you are running the command, usually where your OCR output is created. > > On Thu, Aug 9, 2018 at 3:48 PM <[email protected] > <javascript:>> wrote: > >> Hello Shree, thank you for your prompt reply. >> >> I have now changed the logfile as instructed. Where can i find the output >> tesseract.log file? will it be produced in the same location as the >> logfile? in C:\Program Files (x86)\Tesseract-OCR\tessdata\configs ? I'm >> guessing the tesseract.log file will be produced once i've used logfile in >> the commands. >> >> Kind Regards, >> >> Damon >> >> >> On Wednesday, 8 August 2018 19:07:02 UTC+1, shree wrote: >>> >>> i think this could be if your new traineddats is not trained to as high >>> a accuracy level as the eng traineddata. >>> >>> You can setup a debug log to verify this. see >>> https://github.com/tesseract-ocr/tesseract/issues/1275#issuecomment-360367865 >>> >>> for details >>> >>> On Wed, Aug 8, 2018 at 6:04 PM <[email protected]> wrote: >>> >>>> i'm trying to use the combination of two traineddata dictionaries >>>> together due to one of them being able to recognise specific numbers >>>> better >>>> than the other. >>>> >>>> Here is an example of the code line. >>>> >>>> $codeLine .= '<br>magick convert "'.$filePath.'" >>>> -quality 90 -density 300x300 -units PixelsPerInch "'.$output.'.jpg"'; // >>>> $codeLine .= '<br>tesseract "'.$output.'.jpg" >>>> "'.$output.'" -l fo+eng txt pdf'; >>>> >>>> Despite the fact i put "fo" in front (this is the one that recognises >>>> the number 4 better), it still gives me an output text file that is >>>> exactly >>>> identical to the "eng" dictionary output when i run that solo on it's own. >>>> >>>> For some reason, it chooses to not just prioritise eng but also >>>> completely ignoring the fo traineddata file completely. >>>> >>>> The "fo" file definitely works as i've tested it solo. >>>> >>>> I have attached an image example of the text i'd like to OCR and the >>>> two relevant traineddata files. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/1a5a6768-baeb-4ba9-9cbd-adda6cba957c%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/1a5a6768-baeb-4ba9-9cbd-adda6cba957c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/befd629e-e433-45dd-bf1a-7a5c955e9a61%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/befd629e-e433-45dd-bf1a-7a5c955e9a61%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/51749bbf-1605-4a12-a26a-0b0a9b0c17a5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

