Re: [tesseract-ocr] oem Detection

ShreeDevi Kumar Wed, 14 Jun 2017 06:50:25 -0700

You need to extract .lstm from traineddata

eg. (change foldernames to match ur setup)


combine_tessdata -e  ../tessdata/jpn.traineddata jpn.lstm
Extracting tessdata components from ../tessdata/jpn.traineddata
Wrote jpn.lstm
0:config:size=2573, offset=168
1:unicharset:size=280627, offset=2741
2:unicharambigs:size=4676, offset=283368
3:inttemp:size=30618346, offset=288044
4:pffmtable:size=36561, offset=30906390
5:normproto:size=452735, offset=30942951
6:punc-dawg:size=2602, offset=31395686
7:word-dawg:size=1007922, offset=31398288
8:number-dawg:size=42, offset=32406210
9:freq-dawg:size=1146, offset=32406252
13:shapetable:size=664546, offset=32407398
16:params-model:size=699, offset=33071944
17:lstm:size=10299009, offset=33072643
18:lstm-punc-dawg:size=2602, offset=43371652
19:lstm-word-dawg:size=1005930, offset=43374254
20:lstm-number-dawg:size=50, offset=44380184


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Jun 14, 2017 at 6:45 PM, Ibr <[email protected]> wrote:

> is this command correct too create the intermediate .lstm and _checlpoint?
>
> training/lstmtraining --model_output ~/tesstutorial/impact_from_small/impact
> \
>    --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt  \
>   --continue_from ~/tesstutorial/impact_from_full/jpn.lstm
>
> as for --continue_from, its mentioned in here
> <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact>
> its can be for recognition model which is be .lstm, if not what is the
> existing model? because when I run the command above it says:-
> Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.traineddata,
> unpacking...
> Failed to continue from: /home/ibr/tesstutorial/impact_
> from_full/jpn.traineddata
>
>
> On Tuesday, June 13, 2017 at 4:28:21 PM UTC+3, shree wrote:
>
>> combine_tessdata -e
>>
>> extracts the lstm file from the traineddata provided from original
>> training by google.
>>
>> -----------------
>>  tesstrain.sh it will create .lstmf files
>>
>> yes. these are created from the box-tiff pairs created from the training
>> text and fonts
>>
>> ---------------------------
>>
>> lstmtraining program takes all of these .lstmf files (via the file which
>> has all the .lstmf filenames)
>> and
>> creates intermediate .lstm files and _checkpoint files
>>
>> -------------------------------
>> these can be converted to the final .lstm file for use in traineddata
>> --------------------------
>> the final .lstm file has to be combined using combine_tessdata to create
>> new traineddata.
>>
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Jun 13, 2017 at 6:09 PM, Ibr <[email protected]> wrote:
>>
>>> thanks for the response, well actually I wrote the command wrong, I
>>> wanted to combine, also I didn't extract the lstm file before I do the
>>> combination, which brings another question.
>>>
>>> if I use the tesstrain.sh it will create .lstmf files, correct? but if I
>>> used combine_tessdata -e that will create lstm file, so what is the
>>> difference between both of them?
>>> I know that lstmf files are substitute for the .tr files, if you gave me
>>> little explanation about both I would be grateful, since there were not
>>> much of explanation on the web about them
>>>
>>> Thanks in advance
>>>
>>>
>>> On Tuesday, June 13, 2017 at 3:03:40 PM UTC+3, shree wrote:
>>>
>>>> you have to be clear on what files you are combining.
>>>>
>>>> the command you have given is overwriting japanese traineddata - is
>>>> that what you want to do?
>>>>
>>>> > *training/combine_tessdata -o tessdata/jpn.traineddata*
>>>>
>>>> *Look at help for all options of combine_tessdata*
>>>>
>>>> *Figure out which files (lstm, dawg etc) you want to combine*
>>>>
>>>> *Give appropriate command options and files to create new traineddata*
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Tue, Jun 13, 2017 at 5:25 PM, Ibr <[email protected]> wrote:
>>>>
>>>>> seems so, to add or merge the new LSTM files in the traineddata this
>>>>> command to user correct: *training/combine_tessdata -o
>>>>> tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm*
>>>>> but that gave me the following:
>>>>> TessdataManager can't determine which tessdata component is
>>>>> represented by lstmf
>>>>> TessdataManager combined tesseract data files.
>>>>> Offset for type  0 (.traineddataconfig                ) is 172
>>>>> Offset for type  1 (.traineddataunicharset            ) is 2745
>>>>> Offset for type  2 (.traineddataunicharambigs         ) is 283372
>>>>> Offset for type  3 (.traineddatainttemp               ) is 288048
>>>>> Offset for type  4 (.traineddatapffmtable             ) is 30906394
>>>>> Offset for type  5 (.traineddatanormproto             ) is 30942955
>>>>> Offset for type  6 (.traineddatapunc-dawg             ) is 31395690
>>>>> Offset for type  7 (.traineddataword-dawg             ) is 31398292
>>>>> Offset for type  8 (.traineddatanumber-dawg           ) is 32406214
>>>>> Offset for type  9 (.traineddatafreq-dawg             ) is 32406256
>>>>> Offset for type 10 (.traineddatafixed-length-dawgs    ) is -1
>>>>> Offset for type 11 (.traineddatacube-unicharset       ) is -1
>>>>> Offset for type 12 (.traineddatacube-word-dawg        ) is -1
>>>>> Offset for type 13 (.traineddatashapetable            ) is 32407402
>>>>> Offset for type 14 (.traineddatabigram-dawg           ) is -1
>>>>> Offset for type 15 (.traineddataunambig-dawg          ) is -1
>>>>> Offset for type 16 (.traineddataparams-model          ) is 33071948
>>>>> Offset for type 17 (.traineddatalstm                  ) is 33072647
>>>>> Offset for type 18 (.traineddatalstm-punc-dawg        ) is 43371656
>>>>> Offset for type 19 (.traineddatalstm-word-dawg        ) is 43374258
>>>>> Offset for type 20 (.traineddatalstm-number-dawg      ) is 44380188
>>>>>
>>>>> any idea?
>>>>> thanks
>>>>>
>>>>>
>>>>> On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote:
>>>>>
>>>>>> *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1*
>>>>>>
>>>>>> *uses the LSTM files that are there in ara.traineddata in your
>>>>>> tessdata directory.*
>>>>>>
>>>>>> *Just placing lstm files in tesseract folder is not going to change
>>>>>> anything.*
>>>>>>
>>>>>> *You need to create a new traineddata with the new lstm files and
>>>>>> then test with it.*
>>>>>>
>>>>>> ShreeDevi
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 3:17 PM, Ibr <[email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> when make detection using the tesseract 4.00.00alpha and use the
>>>>>>> command: *tesseract image results -l ara --tessdata-dir ./tessdata
>>>>>>> --oem 1 *the oem here means "Neural nets LSTM only", so there is no
>>>>>>> argument in tesseract to specify where to find the LSTM files, how the
>>>>>>> tesseract find them? I used to place the LSTM files inside the tesseract
>>>>>>> folder, but I tried to detect after I deleted the LSTM files, with the
>>>>>>> argument --oem 1 which meanst LSTM only yet the detection happened, so 
>>>>>>> does
>>>>>>> the tesseract search in other folders for LSTM files? as I had LSTM 
>>>>>>> files
>>>>>>> in different folders
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c40
>>>>>>> 7-4075-b845-4b226094e752%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af
>>>>> 2-4c5a-850a-62843b185b4b%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/49503e1f-e96e-458e-953f-5acb32367ff7%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/49503e1f-e96e-458e-953f-5acb32367ff7%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXN-WAMDT8BsXSKepF%3Dr4j69vVkh_9ePe5JLfPUN8U_vg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] oem Detection

Reply via email to