The command *tesseract unpack* is not recognized by my version of tesseract, is it a utility that you have yourself or is it already there in any release? Anyway does it only extract the *.box *.*gt.txt .tif* files? If that's the case I can simply copy those file in the folder?
Il giorno giovedì 16 gennaio 2020 10:45:59 UTC+1, shree ha scritto: > > Are you sure you have the files in the right places? It seems to work for > me... > > ubuntu@tesseract-ocr:~/tesseract$ cd ../TEST/lstmf > ubuntu@tesseract-ocr:~/TEST/lstmf$ tesseract unpack eng.test.pro1.lstmf > Extracting eng.test.pro1.lstmf... > Loaded 1/1 lines (1-1) of document eng.test.pro1.lstmf > ubuntu@tesseract-ocr:~/TEST/lstmf$ ls > eng.test.pro1_0.gt.txt eng.test.pro1_0.png eng.test.pro1.box > eng.test.pro1.lstmf eng.test.pro1.tif eng.test.pro5.box > eng.test.pro5.lstmf eng.test.pro5.tif fabio > ubuntu@tesseract-ocr:~/TEST/lstmf$ tesseract unpack eng.test.pro5.lstmf > Extracting eng.test.pro5.lstmf... > Loaded 1/1 lines (1-1) of document eng.test.pro5.lstmf > ubuntu@tesseract-ocr:~/TEST/lstmf$ ls -1 *.lstmf > all-lstmf > ubuntu@tesseract-ocr:~/TEST/lstmf$ > ubuntu@tesseract-ocr:~/TEST/lstmf$ rm -rf ./lowercase_cursive > ubuntu@tesseract-ocr:~/TEST/lstmf$ mkdir -p ./lowercase_cursive > ubuntu@tesseract-ocr:~/TEST/lstmf$ # > ubuntu@tesseract-ocr:~/TEST/lstmf$ combine_tessdata -e > ~/tessdata_best/eng.traineddata \ > > ./lowercase_cursive/eng.lstm > Extracting tessdata components from > /home/ubuntu/tessdata_best/eng.traineddata > Wrote ./lowercase_cursive/eng.lstm > Version > string:4.00.00alpha:eng:synth20170629:[1,36,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx512O1c1] > 17:lstm:size=11689099, offset=192 > 18:lstm-punc-dawg:size=4322, offset=11689291 > 19:lstm-word-dawg:size=3694794, offset=11693613 > 20:lstm-number-dawg:size=4738, offset=15388407 > 21:lstm-unicharset:size=6360, offset=15393145 > 22:lstm-recoder:size=1012, offset=15399505 > 23:version:size=80, offset=15400517 > ubuntu@tesseract-ocr:~/TEST/lstmf$ # > ubuntu@tesseract-ocr:~/TEST/lstmf$ time lstmtraining \ > > --debug_interval -1 \ > > --model_output ./lowercase_cursive/impact \ > > --continue_from ./lowercase_cursive/eng.lstm \ > > --train_listfile /home/ubuntu/TEST/lstmf/all-lstmf \ > > --traineddata ~/tessdata_best/eng.traineddata \ > > --max_iterations 400 > Loaded file ./lowercase_cursive/eng.lstm, unpacking... > Warning: LSTMTrainer deserialized an LSTMRecognizer! > Continuing from ./lowercase_cursive/eng.lstm > Loaded 1/1 lines (1-1) of document eng.test.pro1.lstmf > Loaded 1/1 lines (1-1) of document eng.test.pro5.lstmf > Iteration 0: GROUND TRUTH : nominating any more Labour life Peers > Iteration 0: ALIGNED TRUTH : nominating any moree Labour life Peers > Iteration 0: BEST OCR TEXT : wominadng ang wow. Lobowr Lfe_ "Paoro > File eng.test.pro1.lstmf line 0 : > Mean rms=3.82%, delta=18.848%, train=75.676%(100%), skip ratio=0% > Iteration 1: GROUND TRUTH : Griffiths, MP for Mancheste Exchange > Iteration 1: ALIGNED TRUTH : Griiffiths, MP for Mancheste Exchanngee > Iteration 1: BEST OCR TEXT : Galbhtha , UP Roe Mowomadl) Cxerlaomqre > File eng.test.pro5.lstmf line 0 : > Mean rms=3.908%, delta=20.581%, train=86.449%(100%), skip ratio=0% > Iteration 2: GROUND TRUTH : nominating any more Labour life Peers > Iteration 2: BEST OCR TEXT : wominading any wone. Lobowr Lfe. "Paoro > File eng.test.pro1.lstmf line 0 : > Mean rms=3.74%, delta=19.305%, train=75.651%(94.444%), skip ratio=0% > Iteration 3: GROUND TRUTH : Griffiths, MP for Mancheste Exchange > Iteration 3: ALIGNED TRUTH : Griffiths, MP for Mancheste Exchanngee > Iteration 3: BEST OCR TEXT : Galbhtha , MUP foe Manomadl) Cxclaomgle > File eng.test.pro5.lstmf line 0 : > Mean rms=3.708%, delta=18.921%, train=78.266%(95.833%), skip ratio=0% > Iteration 4: GROUND TRUTH : nominating any more Labour life Peers > Iteration 4: BEST OCR TEXT : wominading any wone Loabour Lfe. "Paro > > On Wed, Jan 15, 2020 at 8:15 PM 'Fabio Lugli' via tesseract-ocr < > tesser...@googlegroups.com <javascript:>> wrote: > >> Yes, i forgot to do it in the latest post. I share a couple of the images >> and their correspondant .*box *and .*lstmf *files. The others that i >> tried until now are very similar to these ones. >> >> Il giorno mercoledì 15 gennaio 2020 15:38:23 UTC+1, shree ha scritto: >>> >>> Please share a couple of lstmf files for testing. >>> >>> On Wed, Jan 15, 2020 at 8:03 PM 'Fabio Lugli' via tesseract-ocr < >>> tesser...@googlegroups.com> wrote: >>> >>>> After some work i am able to: >>>> - Use the method *lstmbox* of *tesseract.exe* to obtain the *.box* files >>>> of my *.tif* images >>>> - Use the third party software *JTessBoxEditor* to correct the >>>> recognized characters, leaving boxes all around the full line of text >>>> - Use the method *lstm.train* of *tesseract.exe* to obtain the *.lstmf* >>>> files >>>> from the *.box* files >>>> >>>> Now when i try to use *lstmtraining.exe, *using *eng*.*traineddata *as >>>> starter traineddata i obtain the error: >>>> >>>> *Deserialize header failed: [myfile1].lstmf* >>>> *Deserialize header failed: **[myfile2]**.lstmf* >>>> *Deserialize header failed: **[myfile3]**.lstmf* >>>> *Loaded 1/1 lines (1-1) of document **[myfile4]**.lstmf* >>>> *Load of images failed!!* >>>> >>>> From this i can understand there is an error either in the process of >>>> creating *.lstmf* files or in the images themselves that i have >>>> selected. Any suggestion is well accepted. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesser...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7e075fb6-ac4d-4125-96a6-98d520b88ca3%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/7e075fb6-ac4d-4125-96a6-98d520b88ca3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5c4e3998-ff4c-43be-b207-c5068c921c0a%40googlegroups.com.