Re: [tesseract-ocr] oem Detection

ShreeDevi Kumar Tue, 13 Jun 2017 05:04:24 -0700

you have to be clear on what files you are combining.

the command you have given is overwriting japanese traineddata - is that
what you want to do?


> *training/combine_tessdata -o tessdata/jpn.traineddata*

*Look at help for all options of combine_tessdata*

*Figure out which files (lstm, dawg etc) you want to combine*

*Give appropriate command options and files to create new traineddata*

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jun 13, 2017 at 5:25 PM, Ibr <[email protected]> wrote:

> seems so, to add or merge the new LSTM files in the traineddata this
> command to user correct: *training/combine_tessdata -o
> tessdata/jpn.traineddata ~/tesstutorial/eng_from_chi/.lstm*
> but that gave me the following:
> TessdataManager can't determine which tessdata component is represented by
> lstmf
> TessdataManager combined tesseract data files.
> Offset for type  0 (.traineddataconfig                ) is 172
> Offset for type  1 (.traineddataunicharset            ) is 2745
> Offset for type  2 (.traineddataunicharambigs         ) is 283372
> Offset for type  3 (.traineddatainttemp               ) is 288048
> Offset for type  4 (.traineddatapffmtable             ) is 30906394
> Offset for type  5 (.traineddatanormproto             ) is 30942955
> Offset for type  6 (.traineddatapunc-dawg             ) is 31395690
> Offset for type  7 (.traineddataword-dawg             ) is 31398292
> Offset for type  8 (.traineddatanumber-dawg           ) is 32406214
> Offset for type  9 (.traineddatafreq-dawg             ) is 32406256
> Offset for type 10 (.traineddatafixed-length-dawgs    ) is -1
> Offset for type 11 (.traineddatacube-unicharset       ) is -1
> Offset for type 12 (.traineddatacube-word-dawg        ) is -1
> Offset for type 13 (.traineddatashapetable            ) is 32407402
> Offset for type 14 (.traineddatabigram-dawg           ) is -1
> Offset for type 15 (.traineddataunambig-dawg          ) is -1
> Offset for type 16 (.traineddataparams-model          ) is 33071948
> Offset for type 17 (.traineddatalstm                  ) is 33072647
> Offset for type 18 (.traineddatalstm-punc-dawg        ) is 43371656
> Offset for type 19 (.traineddatalstm-word-dawg        ) is 43374258
> Offset for type 20 (.traineddatalstm-number-dawg      ) is 44380188
>
> any idea?
> thanks
>
>
> On Tuesday, June 13, 2017 at 2:36:54 PM UTC+3, shree wrote:
>
>> *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1*
>>
>> *uses the LSTM files that are there in ara.traineddata in your tessdata
>> directory.*
>>
>> *Just placing lstm files in tesseract folder is not going to change
>> anything.*
>>
>> *You need to create a new traineddata with the new lstm files and then
>> test with it.*
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Jun 13, 2017 at 3:17 PM, Ibr <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> when make detection using the tesseract 4.00.00alpha and use the
>>> command: *tesseract image results -l ara --tessdata-dir ./tessdata
>>> --oem 1 *the oem here means "Neural nets LSTM only", so there is no
>>> argument in tesseract to specify where to find the LSTM files, how the
>>> tesseract find them? I used to place the LSTM files inside the tesseract
>>> folder, but I tried to detect after I deleted the LSTM files, with the
>>> argument --oem 1 which meanst LSTM only yet the detection happened, so does
>>> the tesseract search in other folders for LSTM files? as I had LSTM files
>>> in different folders
>>>
>>> Thanks.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/eefc8290-c407-4075-b845-4b226094e752%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/16ce1839-6af2-4c5a-850a-62843b185b4b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWPixnX-ffKa2jG3xsxMajKLsuxOSUpmK7SzK%2BKVz0x5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] oem Detection

Reply via email to