> > How comes that all characters appearing are Unicode replacement files? Did > I misconfigure something? >
This could be a locale or encoding issue. It needs to be a unicode text file, I open in notepad++ in windows10, encode in utf-8. I run training on a ubuntu machine remotely. > > Is the warning in the line 75 important? > No. I usually give a 0 in the network spec and it uses the number of characters in unicharset. Warning: LSTMTrainer deserialized an LSTMRecognizer! Continuing from data/eng/eng.lstm Appending a new network to an old one!!Warning: given outputs 1 not equal to unicharset of 130. Num outputs,weights in Series: Lfx96:96, 74112 Fc130:130, 12610 Total weights = 86722 Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys64Lfx96Lrx96Lfx96Fc130] from request [Lfx 96 O1c1] Training parameters: Debug interval = -1, weights = 0.1, learning rate = 0.001, momentum=0.5 null char=2 > > What does null char=374 in the line 93 mean? > I don't know. Please look at the unicharset files, they usually have a line related to NULL right near the top. > > On Sat, 22 Feb 2020 at 10:56, Shree Devi Kumar <[email protected]> > wrote: > >> try with the following - ie with a new output name so that training >> starts again from 0. The debug output for each iteration (line of text) >> will show you if any particular font is not aligning or if there are some >> issues. >> >> lstmtraining --traineddata data/akk/akk.traineddata --old_traineddata >> /usr/share/tesseract-ocr/4.00/tessdata/akk-1m.traineddata --continue_from >> data/akk-1m/akk.lstm --model_output data/akk/checkpoints/akkNEW >> --train_listfile data/akk/list.train --eval_listfile data/akk/list.eval >> --max_iterations 1000 --debug_level -1 >> >> >> >> On Sat, Feb 22, 2020 at 2:52 PM Wincent Balin <[email protected]> >> wrote: >> >>> Hello Shree, >>> >>> I tried that. The command was >>> >>> lstmtraining --traineddata data/akk/akk.traineddata >>> --old_traineddata >>> /usr/share/tesseract-ocr/4.00/tessdata/akk-1m.traineddata --continue_from >>> data/akk-1m/akk.lstm --model_output data/akk/checkpoints/akk >>> --train_listfile data/akk/list.train --eval_listfile data/akk/list.eval >>> --max_iterations 1000 --debug_level -1 >>> >>> and the output started with >>> >>> Loaded file data/akk/checkpoints/akk_checkpoint, unpacking... >>> Successfully restored trainer from data/akk/checkpoints/akk_checkpoint >>> Loaded 1/1 pages (1-1) of document >>> data/akk-ground-truth/P336598.000347.CuneiformComposite.exp0.lstmf >>> Loaded 1/1 pages (1-1) of document >>> data/akk-ground-truth/P238121.000012.CuneiformNAOutline_Medium.exp0.lstmf >>> >>> and ended with >>> >>> Loaded 1/1 pages (1-1) of document >>> data/akk-ground-truth/Q005388.000005.Segoe_UI_Historic.exp0.lstmf >>> At iteration 4716762/4760600/4760600, Mean rms=1.436%, delta=8.366%, >>> char train=105.86%, word train=86.31%, skip ratio=0%, wrote checkpoint. >>> >>> Finished! Error rate = 88.246 >>> >>> Do I have have to retrain completely from scratch, meaning without >>> loading the previous checkpoint? >>> >>> Maybe I should check out another approach from yours and try to train >>> with one font excluded, so the LSTM converges. >>> >>> Another thought: I tried training Akkadian with Tesseract 4 once before, >>> but with ground truth consisting of short text files with multiple lines of >>> text, not one-liners. Obviously I used PSM 6, not PSM 11. Is there anything >>> wrong with this approach? >>> >>> >>> Am Montag, 17. Februar 2020 08:23:38 UTC+1 schrieb shree: >>>> >>>> Try lstmtraining again for 1000 iterations with --debug_level -1 >>>> >>>> >>>> >>>> >>>> On Mon, Feb 17, 2020, 01:46 Wincent Balin <[email protected]> wrote: >>>> >>>>> Hello all, >>>>> >>>>> after preparing ground truth files for Akkadian language, I started >>>>> the training using the *tesstrain *Makefile, but over 4000000 >>>>> iterations later, the output is like following: >>>>> >>>>> At iteration 4437804/4478900/4478900, Mean rms=1.453%, delta=9.455%, >>>>> char train=121.423%, word train=87.461%, skip ratio=0%, wrote checkpoint. >>>>> >>>>> Does char train=121% mean CER of 121%? What could be the cause for >>>>> such high values even after over 10 days of training? >>>>> >>>>> Yours truly, >>>>> >>>>> Wincent >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/79acb8ca-cb51-4e23-8853-ca4b3405a718%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79acb8ca-cb51-4e23-8853-ca4b3405a718%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/c5ccc3c8-f18f-4540-93e8-b55ffb37c3ac%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/c5ccc3c8-f18f-4540-93e8-b55ffb37c3ac%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkVjK8NaBL57OCdSGCo5hMGwhtwU5uY1GvMKvCfO1n7g%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkVjK8NaBL57OCdSGCo5hMGwhtwU5uY1GvMKvCfO1n7g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMds_m4u%2BtaZcE0EAp9c1wZzqO8FK1joZQNDVk0ut5gb3A%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMds_m4u%2BtaZcE0EAp9c1wZzqO8FK1joZQNDVk0ut5gb3A%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXW_U7%2BPf9x5%3DK_UGXy96XZFi7paUt_mg%2BrROZ36rymZw%40mail.gmail.com.

