combine_tessdata -e extracts the lstm file from the traineddata provided from original training by google.
----------------- tesstrain.sh it will create .lstmf files yes. these are created from the box-tiff pairs created from the training text and fonts --------------------------- lstmtraining program takes all of these .lstmf files (via the file which has all the .lstmf filenames) and creates intermediate .lstm files and _checkpoint files ------------------------------- these can be converted to the final .lstm file for use in traineddata -------------------------- the final .lstm file has to be combined using combine_tessdata to create new traineddata. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jun 13, 2017 at 6:09 PM, Ibr <[email protected]> wrote: > thanks for the response, well actually I wrote the command wrong, I wanted > to combine, also I didn't extract the lstm file before I do the > combination, which brings another question. > > if I use the tesstrain.sh it will create .lstmf files, correct? but if I > used combine_tessdata -e that will create lstm file, so what is the > difference between both of them? > I know that lstmf files are substitute for the .tr files, if you gave me > little explanation about both I would be grateful, since there were not > much of explanation on the web about them > > Thanks in advance > > > On Tuesday, June 13, 2017 at 3:03:40 PM UTC+3, shree wrote: > >> you have to be clear on what files you are combining. >> >> the command you have given is overwriting japanese traineddata - is that >> what you want to do? >> >> > *training/combine_tessdata -o tessdata/jpn.traineddata* >> >> *Look at help for all options of combine_tessdata* >> >> *Figure out which files (lstm, dawg etc) you want to combine* >> >> *Give appropriate command options and files to create new traineddata* >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Tue, Jun 13, 2017 at 5:25 PM, Ibr <[email protected]> wrote: >> >>> seems so, to add or merge the new LSTM files in the traineddata this >>> command to user correct: *training/combine_tessdata -o >>> tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm* >>> but that gave me the following: >>> TessdataManager can't determine which tessdata component is represented >>> by lstmf >>> TessdataManager combined tesseract data files. >>> Offset for type 0 (.traineddataconfig ) is 172 >>> Offset for type 1 (.traineddataunicharset ) is 2745 >>> Offset for type 2 (.traineddataunicharambigs ) is 283372 >>> Offset for type 3 (.traineddatainttemp ) is 288048 >>> Offset for type 4 (.traineddatapffmtable ) is 30906394 >>> Offset for type 5 (.traineddatanormproto ) is 30942955 >>> Offset for type 6 (.traineddatapunc-dawg ) is 31395690 >>> Offset for type 7 (.traineddataword-dawg ) is 31398292 >>> Offset for type 8 (.traineddatanumber-dawg ) is 32406214 >>> Offset for type 9 (.traineddatafreq-dawg ) is 32406256 >>> Offset for type 10 (.traineddatafixed-length-dawgs ) is -1 >>> Offset for type 11 (.traineddatacube-unicharset ) is -1 >>> Offset for type 12 (.traineddatacube-word-dawg ) is -1 >>> Offset for type 13 (.traineddatashapetable ) is 32407402 >>> Offset for type 14 (.traineddatabigram-dawg ) is -1 >>> Offset for type 15 (.traineddataunambig-dawg ) is -1 >>> Offset for type 16 (.traineddataparams-model ) is 33071948 >>> Offset for type 17 (.traineddatalstm ) is 33072647 >>> Offset for type 18 (.traineddatalstm-punc-dawg ) is 43371656 >>> Offset for type 19 (.traineddatalstm-word-dawg ) is 43374258 >>> Offset for type 20 (.traineddatalstm-number-dawg ) is 44380188 >>> >>> any idea? >>> thanks >>> >>> >>> On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote: >>> >>>> *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1* >>>> >>>> *uses the LSTM files that are there in ara.traineddata in your tessdata >>>> directory.* >>>> >>>> *Just placing lstm files in tesseract folder is not going to change >>>> anything.* >>>> >>>> *You need to create a new traineddata with the new lstm files and then >>>> test with it.* >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Tue, Jun 13, 2017 at 3:17 PM, Ibr <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> when make detection using the tesseract 4.00.00alpha and use the >>>>> command: *tesseract image results -l ara --tessdata-dir ./tessdata >>>>> --oem 1 *the oem here means "Neural nets LSTM only", so there is no >>>>> argument in tesseract to specify where to find the LSTM files, how the >>>>> tesseract find them? I used to place the LSTM files inside the tesseract >>>>> folder, but I tried to detect after I deleted the LSTM files, with the >>>>> argument --oem 1 which meanst LSTM only yet the detection happened, so >>>>> does >>>>> the tesseract search in other folders for LSTM files? as I had LSTM files >>>>> in different folders >>>>> >>>>> Thanks. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c40 >>>>> 7-4075-b845-4b226094e752%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVGfPk_0jhftLqaepx3RUbGW5OhuhKo1RN5w-E2mTjJ_Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

