You can add *training >> data/lang.log &* to the end of your training script (shell) to get a log saved inside your data folder. You also add *DEBUG_INTERVAL=-1 training >> data/lang.log &. *This one flashes more detailed information on the console; and saves a short log inside the data folder. If you want to save everything displayed in the console saved to log file, you can check out methods listed here: https://unix.stackexchange.com/questions/200637/save-all-the-terminal-output-to-a-file
On Tuesday, October 24, 2023 at 3:45:23 PM UTC+3 [email protected] wrote: > I have made a first try for a fine tuning, the script run a second and end > without any error message. Where can I find a log file ? > > Le lun. 23 oct. 2023 à 14:01, Keith Smith <[email protected]> a > écrit : > >> Rene, the name “foo” is simply an example (or fictitious) font or >> language name. When training a new language or font, you should replace >> “foo” with the name of your language or font. The standard is to choose 3 >> letters, but that is not required. In fact, I have been training a font >> named “micr_e13b” and it is working technically for me (though the accuracy >> isn’t good enough yet). Note the underscore character between sections of >> the name. >> >> >> >> Internal >> >> *From: *[email protected] <[email protected]> on >> behalf of René JM Clais <[email protected]> >> *Date: *Sunday, October 22, 2023 at 12:41 PM >> *To: *[email protected] <[email protected]> >> *Subject: *[EXTERNAL] Re: [tesseract-ocr] Lessons, best practices, >> recommendations, strategies, hacks >> >> *CAUTION EXTERNAL EMAIL * >> *DO NOT open attachments or click on links from unknown senders or >> unexpected emails.* >> >> >> >> Hi Keith, >> >> The foo.traindedata is not existing but do you mean : the trainedata I >> want to train ex: hye.traineddata ? >> >> In my case I should add a new character in the hye.traineddata >> >> It seems that I can do this using the option 2 ! >> >> But how ? Which command should I use to execute this function and what >> does mean this process ? >> >> >> >> Thank you for your help >> >> Regards >> >> René >> >> >> >> Le sam. 21 oct. 2023 à 17:18, Keith Smith <[email protected]> a écrit : >> >> Thank you Des for your help in this community. It is greatly appreciated! >> >> As one who is struggling, may I make a suggestion. >> >> I have started a google doc here >> <https://urldefense.com/v3/__https://docs.google.com/document/d/1Vz6y4LcqczAAE2yKc_xYecy1eChjHZbsxb13_7ntUh0/edit?usp=sharing__;!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8Lr1zVHA$> >> >> with a suggested format for a tutorial which would be very helpful to me >> and I think to others. It is editable by anyone with the link. >> >> I'm glad to put in any work myself, but my guess is that there are things >> in the doc that could be filled without much effort by you or others. >> >> If this is true, once the doc is filled out, the contents of the google >> doc could be submitted as a PR to the tesstrain repo. >> >> Again, just a suggestion that I hope would be helpful to all. >> >> >> >> Thanks, >> >> Keith >> >> >> >> On Sat, Oct 21, 2023 at 8:28 AM Des Bw <[email protected]> wrote: >> >> There is no exhaustive user manual for training tesseract. We all start >> in the darkness; and accumulate bits of information in different places to >> learn the ins and outs of tesseract. >> >> >> >> It would be great if we can collectively write a better manual. Up until >> then, we can drop /collect our observations, best practices, hacks and >> lessons we accumulated in our adventure with tesseract. >> >> >> >> I will start with some of my observations. I collect them by reading in >> between the lines: from my own failed experiments: >> >> 1. Training from scratch is very difficult because tesseract requires >> extensive data set. It looks like it requires over 300,000 test lines >> (around 26mb text file). >> >> https://github.com/tesseract-ocr/tesseract/issues/3909 >> <https://urldefense.com/v3/__https://github.com/tesseract-ocr/tesseract/issues/3909__;!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8Yk4Xcmo$> >> >> >> >> Multiple that with the fonts you want to train, the data grows >> exponentially. That requires very powerful computers running for weeks and >> months. >> >> So, for the regular users, training from a network layer, or fine tuning >> are the most plausible options. >> >> >> >> 2. Best practice: make your text lines not too long. The recommended >> number of works in a line is 10-12. Again from the above link. >> >> >> >> ( ...to be continued) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/bf0cd568-9b5b-4e42-be6e-6225ed6a3892n%40googlegroups.com >> >> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/bf0cd568-9b5b-4e42-be6e-6225ed6a3892n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8KHJKCVc$> >> . >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAL1pF5ZHL-_9shmwX%3DAUrnDWHJZBWiZutT9zc-j8Oxih8c6D2A%40mail.gmail.com >> >> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/CAL1pF5ZHL-_9shmwX*3DAUrnDWHJZBWiZutT9zc-j8Oxih8c6D2A*40mail.gmail.com?utm_medium=email&utm_source=footer__;JSU!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8MAoDn2A$> >> . >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rtwFJ247UCtLgggB_WTs0%3DUajag0_M29Fe%2B8zCy0OZXw%40mail.gmail.com >> >> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rtwFJ247UCtLgggB_WTs0*3DUajag0_M29Fe*2B8zCy0OZXw*40mail.gmail.com?utm_medium=email&utm_source=footer__;JSUl!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8U3w3mDk$> >> . >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/PH0PR19MB567279E2B80440267AA1D2F7B6D8A%40PH0PR19MB5672.namprd19.prod.outlook.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/PH0PR19MB567279E2B80440267AA1D2F7B6D8A%40PH0PR19MB5672.namprd19.prod.outlook.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/befea0b6-cab3-4f02-82bf-b2d1c97a51aen%40googlegroups.com.

