>
> How comes that all characters appearing are Unicode replacement files? Did
> I misconfigure something?
>

This could be a locale or encoding issue. It needs to be a unicode text
file, I open in notepad++ in windows10, encode in utf-8. I run training on
a ubuntu machine remotely.

>
> Is the warning in the line 75 important?
>

No. I usually give a 0 in the network spec and it uses the number of
characters in unicharset.

Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from data/eng/eng.lstm
Appending a new network to an old one!!Warning: given outputs 1 not equal
to unicharset of 130.
Num outputs,weights in Series:
  Lfx96:96, 74112
  Fc130:130, 12610
Total weights = 86722
Built network:[1,36,0,1[C3,3Ft16]Mp3,3Lfys64Lfx96Lrx96Lfx96Fc130] from
request [Lfx 96 O1c1]
Training parameters:
  Debug interval = -1, weights = 0.1, learning rate = 0.001, momentum=0.5
null char=2

>
> What does null char=374 in the line 93 mean?
>

I don't know. Please look at the unicharset files, they usually have a line
related to NULL right near the top.

>
> On Sat, 22 Feb 2020 at 10:56, Shree Devi Kumar <[email protected]>
> wrote:
>
>> try with the following - ie with a new output name so that training
>> starts again from 0. The debug output for each iteration (line of text)
>> will show you if any particular font is not aligning or if there are some
>> issues.
>>
>> lstmtraining   --traineddata data/akk/akk.traineddata   --old_traineddata
>> /usr/share/tesseract-ocr/4.00/tessdata/akk-1m.traineddata   --continue_from
>> data/akk-1m/akk.lstm   --model_output data/akk/checkpoints/akkNEW
>> --train_listfile data/akk/list.train   --eval_listfile data/akk/list.eval
>> --max_iterations 1000   --debug_level -1
>>
>>
>>
>> On Sat, Feb 22, 2020 at 2:52 PM Wincent Balin <[email protected]>
>> wrote:
>>
>>> Hello Shree,
>>>
>>> I tried that. The command was
>>>
>>> lstmtraining   --traineddata data/akk/akk.traineddata
>>> --old_traineddata
>>> /usr/share/tesseract-ocr/4.00/tessdata/akk-1m.traineddata   --continue_from
>>> data/akk-1m/akk.lstm   --model_output data/akk/checkpoints/akk
>>> --train_listfile data/akk/list.train   --eval_listfile data/akk/list.eval
>>> --max_iterations 1000   --debug_level -1
>>>
>>> and the output started with
>>>
>>> Loaded file data/akk/checkpoints/akk_checkpoint, unpacking...
>>> Successfully restored trainer from data/akk/checkpoints/akk_checkpoint
>>> Loaded 1/1 pages (1-1) of document
>>> data/akk-ground-truth/P336598.000347.CuneiformComposite.exp0.lstmf
>>> Loaded 1/1 pages (1-1) of document
>>> data/akk-ground-truth/P238121.000012.CuneiformNAOutline_Medium.exp0.lstmf
>>>
>>> and ended with
>>>
>>> Loaded 1/1 pages (1-1) of document
>>> data/akk-ground-truth/Q005388.000005.Segoe_UI_Historic.exp0.lstmf
>>> At iteration 4716762/4760600/4760600, Mean rms=1.436%, delta=8.366%,
>>> char train=105.86%, word train=86.31%, skip ratio=0%,  wrote checkpoint.
>>>
>>> Finished! Error rate = 88.246
>>>
>>> Do I have have to retrain completely from scratch, meaning without
>>> loading the previous checkpoint?
>>>
>>> Maybe I should check out another approach from yours and try to train
>>> with one font excluded, so the LSTM converges.
>>>
>>> Another thought: I tried training Akkadian with Tesseract 4 once before,
>>> but with ground truth consisting of short text files with multiple lines of
>>> text, not one-liners. Obviously I used PSM 6, not PSM 11. Is there anything
>>> wrong with this approach?
>>>
>>>
>>> Am Montag, 17. Februar 2020 08:23:38 UTC+1 schrieb shree:
>>>>
>>>> Try lstmtraining again for 1000 iterations with --debug_level -1
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Feb 17, 2020, 01:46 Wincent Balin <[email protected]> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> after preparing ground truth files for Akkadian language, I started
>>>>> the training using the *tesstrain *Makefile, but over 4000000
>>>>> iterations later, the output is like following:
>>>>>
>>>>> At iteration 4437804/4478900/4478900, Mean rms=1.453%, delta=9.455%,
>>>>> char train=121.423%, word train=87.461%, skip ratio=0%,  wrote checkpoint.
>>>>>
>>>>> Does char train=121% mean CER of 121%? What could be the cause for
>>>>> such high values even after over 10 days of training?
>>>>>
>>>>> Yours truly,
>>>>>
>>>>> Wincent
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/79acb8ca-cb51-4e23-8853-ca4b3405a718%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79acb8ca-cb51-4e23-8853-ca4b3405a718%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/c5ccc3c8-f18f-4540-93e8-b55ffb37c3ac%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/c5ccc3c8-f18f-4540-93e8-b55ffb37c3ac%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkVjK8NaBL57OCdSGCo5hMGwhtwU5uY1GvMKvCfO1n7g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWkVjK8NaBL57OCdSGCo5hMGwhtwU5uY1GvMKvCfO1n7g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMds_m4u%2BtaZcE0EAp9c1wZzqO8FK1joZQNDVk0ut5gb3A%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CANuFvMds_m4u%2BtaZcE0EAp9c1wZzqO8FK1joZQNDVk0ut5gb3A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXW_U7%2BPf9x5%3DK_UGXy96XZFi7paUt_mg%2BrROZ36rymZw%40mail.gmail.com.

Reply via email to