Re: [tesseract-ocr] oem Detection

ShreeDevi Kumar Tue, 13 Jun 2017 06:28:36 -0700

combine_tessdata -e

extracts the lstm file from the traineddata provided from original training
by google.


-----------------
 tesstrain.sh it will create .lstmf files

yes. these are created from the box-tiff pairs created from the training
text and fonts

---------------------------

lstmtraining program takes all of these .lstmf files (via the file which
has all the .lstmf filenames)
and
creates intermediate .lstm files and _checkpoint files

-------------------------------
these can be converted to the final .lstm file for use in traineddata
--------------------------
the final .lstm file has to be combined using combine_tessdata to create
new traineddata.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jun 13, 2017 at 6:09 PM, Ibr <[email protected]> wrote:

> thanks for the response, well actually I wrote the command wrong, I wanted
> to combine, also I didn't extract the lstm file before I do the
> combination, which brings another question.
>
> if I use the tesstrain.sh it will create .lstmf files, correct? but if I
> used combine_tessdata -e that will create lstm file, so what is the
> difference between both of them?
> I know that lstmf files are substitute for the .tr files, if you gave me
> little explanation about both I would be grateful, since there were not
> much of explanation on the web about them
>
> Thanks in advance
>
>
> On Tuesday, June 13, 2017 at 3:03:40 PM UTC+3, shree wrote:
>
>> you have to be clear on what files you are combining.
>>
>> the command you have given is overwriting japanese traineddata - is that
>> what you want to do?
>>
>> > *training/combine_tessdata -o tessdata/jpn.traineddata*
>>
>> *Look at help for all options of combine_tessdata*
>>
>> *Figure out which files (lstm, dawg etc) you want to combine*
>>
>> *Give appropriate command options and files to create new traineddata*
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Jun 13, 2017 at 5:25 PM, Ibr <[email protected]> wrote:
>>
>>> seems so, to add or merge the new LSTM files in the traineddata this
>>> command to user correct: *training/combine_tessdata -o
>>> tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm*
>>> but that gave me the following:
>>> TessdataManager can't determine which tessdata component is represented
>>> by lstmf
>>> TessdataManager combined tesseract data files.
>>> Offset for type  0 (.traineddataconfig                ) is 172
>>> Offset for type  1 (.traineddataunicharset            ) is 2745
>>> Offset for type  2 (.traineddataunicharambigs         ) is 283372
>>> Offset for type  3 (.traineddatainttemp               ) is 288048
>>> Offset for type  4 (.traineddatapffmtable             ) is 30906394
>>> Offset for type  5 (.traineddatanormproto             ) is 30942955
>>> Offset for type  6 (.traineddatapunc-dawg             ) is 31395690
>>> Offset for type  7 (.traineddataword-dawg             ) is 31398292
>>> Offset for type  8 (.traineddatanumber-dawg           ) is 32406214
>>> Offset for type  9 (.traineddatafreq-dawg             ) is 32406256
>>> Offset for type 10 (.traineddatafixed-length-dawgs    ) is -1
>>> Offset for type 11 (.traineddatacube-unicharset       ) is -1
>>> Offset for type 12 (.traineddatacube-word-dawg        ) is -1
>>> Offset for type 13 (.traineddatashapetable            ) is 32407402
>>> Offset for type 14 (.traineddatabigram-dawg           ) is -1
>>> Offset for type 15 (.traineddataunambig-dawg          ) is -1
>>> Offset for type 16 (.traineddataparams-model          ) is 33071948
>>> Offset for type 17 (.traineddatalstm                  ) is 33072647
>>> Offset for type 18 (.traineddatalstm-punc-dawg        ) is 43371656
>>> Offset for type 19 (.traineddatalstm-word-dawg        ) is 43374258
>>> Offset for type 20 (.traineddatalstm-number-dawg      ) is 44380188
>>>
>>> any idea?
>>> thanks
>>>
>>>
>>> On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote:
>>>
>>>> *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1*
>>>>
>>>> *uses the LSTM files that are there in ara.traineddata in your tessdata
>>>> directory.*
>>>>
>>>> *Just placing lstm files in tesseract folder is not going to change
>>>> anything.*
>>>>
>>>> *You need to create a new traineddata with the new lstm files and then
>>>> test with it.*
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Tue, Jun 13, 2017 at 3:17 PM, Ibr <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> when make detection using the tesseract 4.00.00alpha and use the
>>>>> command: *tesseract image results -l ara --tessdata-dir ./tessdata
>>>>> --oem 1 *the oem here means "Neural nets LSTM only", so there is no
>>>>> argument in tesseract to specify where to find the LSTM files, how the
>>>>> tesseract find them? I used to place the LSTM files inside the tesseract
>>>>> folder, but I tried to detect after I deleted the LSTM files, with the
>>>>> argument --oem 1 which meanst LSTM only yet the detection happened, so 
>>>>> does
>>>>> the tesseract search in other folders for LSTM files? as I had LSTM files
>>>>> in different folders
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c40
>>>>> 7-4075-b845-4b226094e752%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ef0bbae1-572c-4a05-949e-83b8cb8b69f0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVGfPk_0jhftLqaepx3RUbGW5OhuhKo1RN5w-E2mTjJ_Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] oem Detection

Reply via email to