You need to extract .lstm from traineddata eg. (change foldernames to match ur setup)
combine_tessdata -e ../tessdata/jpn.traineddata jpn.lstm Extracting tessdata components from ../tessdata/jpn.traineddata Wrote jpn.lstm 0:config:size=2573, offset=168 1:unicharset:size=280627, offset=2741 2:unicharambigs:size=4676, offset=283368 3:inttemp:size=30618346, offset=288044 4:pffmtable:size=36561, offset=30906390 5:normproto:size=452735, offset=30942951 6:punc-dawg:size=2602, offset=31395686 7:word-dawg:size=1007922, offset=31398288 8:number-dawg:size=42, offset=32406210 9:freq-dawg:size=1146, offset=32406252 13:shapetable:size=664546, offset=32407398 16:params-model:size=699, offset=33071944 17:lstm:size=10299009, offset=33072643 18:lstm-punc-dawg:size=2602, offset=43371652 19:lstm-word-dawg:size=1005930, offset=43374254 20:lstm-number-dawg:size=50, offset=44380184 ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jun 14, 2017 at 6:45 PM, Ibr <[email protected]> wrote: > is this command correct too create the intermediate .lstm and _checlpoint? > > training/lstmtraining --model_output ~/tesstutorial/impact_from_small/impact > \ > --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt \ > --continue_from ~/tesstutorial/impact_from_full/jpn.lstm > > as for --continue_from, its mentioned in here > <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact> > its can be for recognition model which is be .lstm, if not what is the > existing model? because when I run the command above it says:- > Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.traineddata, > unpacking... > Failed to continue from: /home/ibr/tesstutorial/impact_ > from_full/jpn.traineddata > > > On Tuesday, June 13, 2017 at 4:28:21 PM UTC+3, shree wrote: > >> combine_tessdata -e >> >> extracts the lstm file from the traineddata provided from original >> training by google. >> >> ----------------- >> tesstrain.sh it will create .lstmf files >> >> yes. these are created from the box-tiff pairs created from the training >> text and fonts >> >> --------------------------- >> >> lstmtraining program takes all of these .lstmf files (via the file which >> has all the .lstmf filenames) >> and >> creates intermediate .lstm files and _checkpoint files >> >> ------------------------------- >> these can be converted to the final .lstm file for use in traineddata >> -------------------------- >> the final .lstm file has to be combined using combine_tessdata to create >> new traineddata. >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Jun 13, 2017 at 6:09 PM, Ibr <[email protected]> wrote: >> >>> thanks for the response, well actually I wrote the command wrong, I >>> wanted to combine, also I didn't extract the lstm file before I do the >>> combination, which brings another question. >>> >>> if I use the tesstrain.sh it will create .lstmf files, correct? but if I >>> used combine_tessdata -e that will create lstm file, so what is the >>> difference between both of them? >>> I know that lstmf files are substitute for the .tr files, if you gave me >>> little explanation about both I would be grateful, since there were not >>> much of explanation on the web about them >>> >>> Thanks in advance >>> >>> >>> On Tuesday, June 13, 2017 at 3:03:40 PM UTC+3, shree wrote: >>> >>>> you have to be clear on what files you are combining. >>>> >>>> the command you have given is overwriting japanese traineddata - is >>>> that what you want to do? >>>> >>>> > *training/combine_tessdata -o tessdata/jpn.traineddata* >>>> >>>> *Look at help for all options of combine_tessdata* >>>> >>>> *Figure out which files (lstm, dawg etc) you want to combine* >>>> >>>> *Give appropriate command options and files to create new traineddata* >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Tue, Jun 13, 2017 at 5:25 PM, Ibr <[email protected]> wrote: >>>> >>>>> seems so, to add or merge the new LSTM files in the traineddata this >>>>> command to user correct: *training/combine_tessdata -o >>>>> tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm* >>>>> but that gave me the following: >>>>> TessdataManager can't determine which tessdata component is >>>>> represented by lstmf >>>>> TessdataManager combined tesseract data files. >>>>> Offset for type 0 (.traineddataconfig ) is 172 >>>>> Offset for type 1 (.traineddataunicharset ) is 2745 >>>>> Offset for type 2 (.traineddataunicharambigs ) is 283372 >>>>> Offset for type 3 (.traineddatainttemp ) is 288048 >>>>> Offset for type 4 (.traineddatapffmtable ) is 30906394 >>>>> Offset for type 5 (.traineddatanormproto ) is 30942955 >>>>> Offset for type 6 (.traineddatapunc-dawg ) is 31395690 >>>>> Offset for type 7 (.traineddataword-dawg ) is 31398292 >>>>> Offset for type 8 (.traineddatanumber-dawg ) is 32406214 >>>>> Offset for type 9 (.traineddatafreq-dawg ) is 32406256 >>>>> Offset for type 10 (.traineddatafixed-length-dawgs ) is -1 >>>>> Offset for type 11 (.traineddatacube-unicharset ) is -1 >>>>> Offset for type 12 (.traineddatacube-word-dawg ) is -1 >>>>> Offset for type 13 (.traineddatashapetable ) is 32407402 >>>>> Offset for type 14 (.traineddatabigram-dawg ) is -1 >>>>> Offset for type 15 (.traineddataunambig-dawg ) is -1 >>>>> Offset for type 16 (.traineddataparams-model ) is 33071948 >>>>> Offset for type 17 (.traineddatalstm ) is 33072647 >>>>> Offset for type 18 (.traineddatalstm-punc-dawg ) is 43371656 >>>>> Offset for type 19 (.traineddatalstm-word-dawg ) is 43374258 >>>>> Offset for type 20 (.traineddatalstm-number-dawg ) is 44380188 >>>>> >>>>> any idea? >>>>> thanks >>>>> >>>>> >>>>> On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote: >>>>> >>>>>> *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1* >>>>>> >>>>>> *uses the LSTM files that are there in ara.traineddata in your >>>>>> tessdata directory.* >>>>>> >>>>>> *Just placing lstm files in tesseract folder is not going to change >>>>>> anything.* >>>>>> >>>>>> *You need to create a new traineddata with the new lstm files and >>>>>> then test with it.* >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Tue, Jun 13, 2017 at 3:17 PM, Ibr <[email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> when make detection using the tesseract 4.00.00alpha and use the >>>>>>> command: *tesseract image results -l ara --tessdata-dir ./tessdata >>>>>>> --oem 1 *the oem here means "Neural nets LSTM only", so there is no >>>>>>> argument in tesseract to specify where to find the LSTM files, how the >>>>>>> tesseract find them? I used to place the LSTM files inside the tesseract >>>>>>> folder, but I tried to detect after I deleted the LSTM files, with the >>>>>>> argument --oem 1 which meanst LSTM only yet the detection happened, so >>>>>>> does >>>>>>> the tesseract search in other folders for LSTM files? as I had LSTM >>>>>>> files >>>>>>> in different folders >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c40 >>>>>>> 7-4075-b845-4b226094e752%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af >>>>> 2-4c5a-850a-62843b185b4b%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/49503e1f-e96e-458e-953f-5acb32367ff7% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/49503e1f-e96e-458e-953f-5acb32367ff7%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXN-WAMDT8BsXSKepF%3Dr4j69vVkh_9ePe5JLfPUN8U_vg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

