I have made a first try for a fine tuning, the script run a second and end
without any error message. Where can I find a log file ?

Le lun. 23 oct. 2023 à 14:01, Keith Smith <[email protected]> a
écrit :

> Rene, the name “foo” is simply an example (or fictitious) font or language
> name.  When training a new language or font, you should replace “foo” with
> the name of your language or font.  The standard is to choose 3 letters,
> but that is not required.  In fact, I have been training a font named
> “micr_e13b” and it is working technically for me (though the accuracy isn’t
> good enough yet).  Note the underscore character between sections of the
> name.
>
>
>
> Internal
>
> *From: *[email protected] <[email protected]>
> on behalf of René JM Clais <[email protected]>
> *Date: *Sunday, October 22, 2023 at 12:41 PM
> *To: *[email protected] <[email protected]>
> *Subject: *[EXTERNAL] Re: [tesseract-ocr] Lessons, best practices,
> recommendations, strategies, hacks
>
> *CAUTION EXTERNAL EMAIL *
> *DO NOT open attachments or click on links from unknown senders or
> unexpected emails.*
>
>
>
> Hi Keith,
>
> The foo.traindedata is not existing but do you mean : the trainedata I
> want to train   ex:  hye.traineddata  ?
>
> In my case I should add a new character in the hye.traineddata
>
> It seems that I can do this using the option 2 !
>
> But how ?  Which command  should I use to execute this function and what
> does mean this process ?
>
>
>
> Thank you for your help
>
> Regards
>
> René
>
>
>
> Le sam. 21 oct. 2023 à 17:18, Keith Smith <[email protected]> a
> écrit :
>
> Thank you Des for your help in this community.  It is greatly appreciated!
>
> As one who is struggling, may I make a suggestion.
>
> I have started a google doc here
> <https://urldefense.com/v3/__https://docs.google.com/document/d/1Vz6y4LcqczAAE2yKc_xYecy1eChjHZbsxb13_7ntUh0/edit?usp=sharing__;!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8Lr1zVHA$>
> with a suggested format for a tutorial which would be very helpful to me
> and I think to others. It is editable by anyone with the link.
>
> I'm glad to put in any work myself, but my guess is that there are things
> in the doc that could be filled without much effort by you or others.
>
> If this is true, once the doc is filled out, the contents of the google
> doc could be submitted as a PR to the tesstrain repo.
>
> Again, just a suggestion that I hope would be helpful to all.
>
>
>
> Thanks,
>
> Keith
>
>
>
> On Sat, Oct 21, 2023 at 8:28 AM Des Bw <[email protected]> wrote:
>
> There is no exhaustive user manual for training tesseract. We all start in
> the darkness; and accumulate bits of information in different places to
> learn the ins and outs of tesseract.
>
>
>
> It would be great if we can collectively write a better manual. Up until
> then, we can drop /collect our observations, best  practices, hacks and
> lessons we accumulated in our adventure with tesseract.
>
>
>
> I will start with some of my observations. I collect them by reading in
> between the lines: from my own failed experiments:
>
> 1. Training from scratch is very difficult because tesseract requires
> extensive data set. It looks like it requires over 300,000 test lines
> (around 26mb text file).
>
> https://github.com/tesseract-ocr/tesseract/issues/3909
> <https://urldefense.com/v3/__https://github.com/tesseract-ocr/tesseract/issues/3909__;!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8Yk4Xcmo$>
>
>
>
>  Multiple that with the fonts you want to train, the data grows
> exponentially. That requires very powerful computers running for weeks and
> months.
>
> So, for the regular users, training from a network layer, or fine tuning
> are the most plausible options.
>
>
>
> 2. Best practice: make your text lines not too long. The recommended
> number of works in a line is 10-12. Again from the above link.
>
>
>
> ( ...to be continued)
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bf0cd568-9b5b-4e42-be6e-6225ed6a3892n%40googlegroups.com
> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/bf0cd568-9b5b-4e42-be6e-6225ed6a3892n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8KHJKCVc$>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAL1pF5ZHL-_9shmwX%3DAUrnDWHJZBWiZutT9zc-j8Oxih8c6D2A%40mail.gmail.com
> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/CAL1pF5ZHL-_9shmwX*3DAUrnDWHJZBWiZutT9zc-j8Oxih8c6D2A*40mail.gmail.com?utm_medium=email&utm_source=footer__;JSU!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8MAoDn2A$>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rtwFJ247UCtLgggB_WTs0%3DUajag0_M29Fe%2B8zCy0OZXw%40mail.gmail.com
> <https://urldefense.com/v3/__https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_rtwFJ247UCtLgggB_WTs0*3DUajag0_M29Fe*2B8zCy0OZXw*40mail.gmail.com?utm_medium=email&utm_source=footer__;JSUl!!MjXRb4uW6x5k!HFOAD-quUbb2dHADKsKiyk_BK3xW49ZAh87HZ3mPU9myi2Zk2t-bdP3ptvhcsV64KhX43EgYbPFZJ5M8U3w3mDk$>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/PH0PR19MB567279E2B80440267AA1D2F7B6D8A%40PH0PR19MB5672.namprd19.prod.outlook.com
> <https://groups.google.com/d/msgid/tesseract-ocr/PH0PR19MB567279E2B80440267AA1D2F7B6D8A%40PH0PR19MB5672.namprd19.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_o2fofTW7G-EsPBgXq4xLDFxQbM_XJwY-BYBRQVmOZvcA%40mail.gmail.com.

Reply via email to