Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-05 Thread Shree Devi Kumar
I haven't tried user_words yet.
pre-processing the image gets you better results.

It works with the modified image and

\A\d\d\d\d\A\A\d\d\d



On Fri, Jul 5, 2019 at 1:55 PM Jochen Naumann 
wrote:

> Thanks, Shree. I appreciate your help!
> I tried your example and it works with your image. Bit it does not work
> with the attached  image test2.jpg,. Tesseract always reads the O as 0,
> although I provided the following pattern: L9143CO\d\d\d
> I added the user_words_file parameter to the config file, but the setting
> is ignored (file monitor shows that my.patterns is accessed but tesseract
> api never tries to open a file called my.user-words)
>
> my config file:
>
> user_patterns_file my.patterns
> user_words_file my.user-words
> lstm_use_matrix 1
>
> Have a nice day.
>
> Am Fr., 5. Juli 2019 um 06:38 Uhr schrieb Shree Devi Kumar <
> shreesh...@gmail.com>:
>
>> I have made a wiki page for using user_patterns with API. Please see
>> https://github.com/tesseract-ocr/tesseract/wiki/APIExample-user_patterns
>>
>> You can try similarly for user_words.
>>
>> On Thu, Jul 4, 2019 at 4:40 PM Jochen Naumann 
>> wrote:
>>
>>>  user_words_file  also does not work, the file is not loaded ( checked
>>> with file monitor).
>>>
>>>
>>> Am Mi., 3. Juli 2019 um 20:31 Uhr schrieb Zdenko Podobny <
>>> zde...@gmail.com>:
>>>
 If command line work for you that most easy way is to follow tesseract
 executable code[1]:
 IMO you need to use variable user_words_file; AFAIR user_words_suffix 
 specifies
 only file extension...
 Then it should work[2] e.g. tessseract will load user words (effect on
 recognition is other topic).

 [1]
 https://github.com/tesseract-ocr/tesseract/blob/4c8b7d5e3539bae18eb8337d5ebc1fccf56c1f93/src/api/tesseractmain.cpp#L357
 [2]
 https://github.com/tesseract-ocr/tesseract/blob/aa78a720a34708eece6e498c32e3593a24aa1e74/src/dict/dict.cpp#L254


 Zdenko


 st 3. 7. 2019 o 19:59 Jochen Naumann 
 napísal(a):

> Thanks, I already tried api->SetVariable("user_words_suffix", "
> user-words");
> Did not work, while specifying it in a config file and using the
> command line tesseract tool it works.
> I used a file monitor tool to see if the process tries to open a
> user-words file, but it did not. The tesseract tool however does.
> I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
> But I am using 4.1, where this is fixed.
> Do you have a working example?
>
>
>
> Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen <
> nguyen...@gmail.com>:
>
>> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
>> https://github.com/tesseract-ocr/tesseract/issues/960
>>
>> api->SetVariable("user_words_suffix", "user-words");
>>
>>
>> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>>>
>>> Hi, I can set the user-words file on the command line with tesseract
>>> tool, but how do I set this using the api?
>>> I searched for it in the sourcecode but could not find it, woult
>>> appreciate any help.
>>>
>>> --
>> You received this message because you are subscribed to a topic in
>> the Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
 --
 You received this message because you are subscribed to a topic 

Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-04 Thread Shree Devi Kumar
I have made a wiki page for using user_patterns with API. Please see
https://github.com/tesseract-ocr/tesseract/wiki/APIExample-user_patterns

You can try similarly for user_words.

On Thu, Jul 4, 2019 at 4:40 PM Jochen Naumann 
wrote:

>  user_words_file  also does not work, the file is not loaded ( checked
> with file monitor).
>
>
> Am Mi., 3. Juli 2019 um 20:31 Uhr schrieb Zdenko Podobny  >:
>
>> If command line work for you that most easy way is to follow tesseract
>> executable code[1]:
>> IMO you need to use variable user_words_file; AFAIR user_words_suffix 
>> specifies
>> only file extension...
>> Then it should work[2] e.g. tessseract will load user words (effect on
>> recognition is other topic).
>>
>> [1]
>> https://github.com/tesseract-ocr/tesseract/blob/4c8b7d5e3539bae18eb8337d5ebc1fccf56c1f93/src/api/tesseractmain.cpp#L357
>> [2]
>> https://github.com/tesseract-ocr/tesseract/blob/aa78a720a34708eece6e498c32e3593a24aa1e74/src/dict/dict.cpp#L254
>>
>>
>> Zdenko
>>
>>
>> st 3. 7. 2019 o 19:59 Jochen Naumann 
>> napísal(a):
>>
>>> Thanks, I already tried api->SetVariable("user_words_suffix", "
>>> user-words");
>>> Did not work, while specifying it in a config file and using the command
>>> line tesseract tool it works.
>>> I used a file monitor tool to see if the process tries to open a
>>> user-words file, but it did not. The tesseract tool however does.
>>> I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
>>> But I am using 4.1, where this is fixed.
>>> Do you have a working example?
>>>
>>>
>>>
>>> Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen <
>>> nguyen...@gmail.com>:
>>>
 https://github.com/tesseract-ocr/tesseract/wiki/APIExample
 https://github.com/tesseract-ocr/tesseract/issues/960

 api->SetVariable("user_words_suffix", "user-words");


 On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>
> Hi, I can set the user-words file on the command line with tesseract
> tool, but how do I set this using the api?
> I searched for it in the sourcecode but could not find it, woult
> appreciate any help.
>
> --
 You received this message because you are subscribed to a topic in the
 Google Groups "tesseract-ocr" group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdXJfsVc9wuiJBjEKm%2BDPw389yg2mXShGZ6%2BRYT%2BDavw%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, 

Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-04 Thread Jochen Naumann
 user_words_file  also does not work, the file is not loaded ( checked with
file monitor).


Am Mi., 3. Juli 2019 um 20:31 Uhr schrieb Zdenko Podobny :

> If command line work for you that most easy way is to follow tesseract
> executable code[1]:
> IMO you need to use variable user_words_file; AFAIR user_words_suffix 
> specifies
> only file extension...
> Then it should work[2] e.g. tessseract will load user words (effect on
> recognition is other topic).
>
> [1]
> https://github.com/tesseract-ocr/tesseract/blob/4c8b7d5e3539bae18eb8337d5ebc1fccf56c1f93/src/api/tesseractmain.cpp#L357
> [2]
> https://github.com/tesseract-ocr/tesseract/blob/aa78a720a34708eece6e498c32e3593a24aa1e74/src/dict/dict.cpp#L254
>
>
> Zdenko
>
>
> st 3. 7. 2019 o 19:59 Jochen Naumann 
> napísal(a):
>
>> Thanks, I already tried api->SetVariable("user_words_suffix", "user-words
>> ");
>> Did not work, while specifying it in a config file and using the command
>> line tesseract tool it works.
>> I used a file monitor tool to see if the process tries to open a
>> user-words file, but it did not. The tesseract tool however does.
>> I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
>> But I am using 4.1, where this is fixed.
>> Do you have a working example?
>>
>>
>>
>> Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen <
>> nguyen...@gmail.com>:
>>
>>> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
>>> https://github.com/tesseract-ocr/tesseract/issues/960
>>>
>>> api->SetVariable("user_words_suffix", "user-words");
>>>
>>>
>>> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:

 Hi, I can set the user-words file on the command line with tesseract
 tool, but how do I set this using the api?
 I searched for it in the sourcecode but could not find it, woult
 appreciate any help.

 --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdXJfsVc9wuiJBjEKm%2BDPw389yg2mXShGZ6%2BRYT%2BDavw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJLfUwTFEUpVQvoSvu32vhFGD4Y8nTG3s1Fuw%2BfB0oTFjg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Zdenko Podobny
If command line work for you that most easy way is to follow tesseract
executable code[1]:
IMO you need to use variable user_words_file; AFAIR user_words_suffix specifies
only file extension...
Then it should work[2] e.g. tessseract will load user words (effect on
recognition is other topic).

[1]
https://github.com/tesseract-ocr/tesseract/blob/4c8b7d5e3539bae18eb8337d5ebc1fccf56c1f93/src/api/tesseractmain.cpp#L357
[2]
https://github.com/tesseract-ocr/tesseract/blob/aa78a720a34708eece6e498c32e3593a24aa1e74/src/dict/dict.cpp#L254


Zdenko


st 3. 7. 2019 o 19:59 Jochen Naumann  napísal(a):

> Thanks, I already tried api->SetVariable("user_words_suffix", "user-words"
> );
> Did not work, while specifying it in a config file and using the command
> line tesseract tool it works.
> I used a file monitor tool to see if the process tries to open a
> user-words file, but it did not. The tesseract tool however does.
> I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
> But I am using 4.1, where this is fixed.
> Do you have a working example?
>
>
>
> Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen  >:
>
>> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
>> https://github.com/tesseract-ocr/tesseract/issues/960
>>
>> api->SetVariable("user_words_suffix", "user-words");
>>
>>
>> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>>>
>>> Hi, I can set the user-words file on the command line with tesseract
>>> tool, but how do I set this using the api?
>>> I searched for it in the sourcecode but could not find it, woult
>>> appreciate any help.
>>>
>>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdXJfsVc9wuiJBjEKm%2BDPw389yg2mXShGZ6%2BRYT%2BDavw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Jochen Naumann
Thanks, I already tried api->SetVariable("user_words_suffix", "user-words");
Did not work, while specifying it in a config file and using the command
line tesseract tool it works.
I used a file monitor tool to see if the process tries to open a user-words
file, but it did not. The tesseract tool however does.
I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
But I am using 4.1, where this is fixed.
Do you have a working example?



Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen :

> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
> https://github.com/tesseract-ocr/tesseract/issues/960
>
> api->SetVariable("user_words_suffix", "user-words");
>
>
> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>>
>> Hi, I can set the user-words file on the command line with tesseract
>> tool, but how do I set this using the api?
>> I searched for it in the sourcecode but could not find it, woult
>> appreciate any help.
>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/wiki/APIExample
https://github.com/tesseract-ocr/tesseract/issues/960

api->SetVariable("user_words_suffix", "user-words");


On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>
> Hi, I can set the user-words file on the command line with tesseract tool, 
> but how do I set this using the api? 
> I searched for it in the sourcecode but could not find it, woult 
> appreciate any help.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.