Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-04 Thread robertyoung0511
Hi, Shree,

I have also tried the new traineddata to recognize the simplified Chinese 
with the Linux system (ubuntu), and it works. but it seems that the new 
traineddata dosen't support in the windows.

For the new traineddata in the ubuntu, there is also some special symbols 
cannot be recognized, such as, '∠', '△', '≌', '≥' and so on.

And, I will improve these special symbols' recognition. But there is no 
good way to implement it now. Can you give me some advice?

Thanks.

在 2017年8月1日星期二 UTC+8下午4:45:07,shree写道:
>
> Ray has uploaded new traineddata files in 
> https://github.com/tesseract-ocr/tessdata/tree/master/best
>
> Why don't you first try recognition with that
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Aug 1, 2017 at 1:45 PM,  
> wrote:
>
>> Hello, Shree:
>>
>> I'm sorry, but whether can I use more than one unicharset, such as 
>> chi_sim and eng and so on, to finetune the training? 
>> Maybe some special characters can be in other unicharsets. If I find 
>> it/them, maybe I will train my traineddata with more unicharsets, and the 
>> special characters will be encoded at that time.
>>
>> Thanks, and hope for your reply.
>>
>> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>>>
>>> That error is because some characters in your training text are not part 
>>> of the unicharset of chi_sim.
>>>
>>> You are trying finetune training which will give error. Replace top 
>>> layer will work.
>>>
>>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
>>> all languages. 
>>>
>>> You can tell us if there are any specific characters missing from 
>>> existing traineddata .
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Tue, Jul 25, 2017 at 12:46 PM,  wrote:
>>>
 Hello,

 I apply the command to train my own traineddata:

 lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --target_error_rate 0.01 

 An error appears by Tess4.0 that shown in the following img. The system 
 (Tess4.0) says "Can't encode transcript" for text content such as 
 "化简(-x2)3的结果是...".
 Why? Can you help me?


 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1985a9ff-316f-4e98-bcc6-58880214ab82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-04 Thread robertyoung0511
I have tried the new traineddata with the Linux system (ubuntu). It works, 
but it seems that the new traineddata dosen't support in the windows.

在 2017年8月1日星期二 UTC+8下午6:03:13,roberty...@gmail.com写道:
>
> When I use the new traineddata, it will *report  **an 
>  **error : cannot find the 
> chi_sim.traineddata. Does the new traineddata only support the Tess4.0 alpa 
> release? I use the newest code release.*
>
> 在 2017年8月1日星期二 UTC+8下午4:45:07,shree写道:
>>
>> Ray has uploaded new traineddata files in 
>> https://github.com/tesseract-ocr/tessdata/tree/master/best
>>
>> Why don't you first try recognition with that
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Aug 1, 2017 at 1:45 PM,  wrote:
>>
>>> Hello, Shree:
>>>
>>> I'm sorry, but whether can I use more than one unicharset, such as 
>>> chi_sim and eng and so on, to finetune the training? 
>>> Maybe some special characters can be in other unicharsets. If I find 
>>> it/them, maybe I will train my traineddata with more unicharsets, and the 
>>> special characters will be encoded at that time.
>>>
>>> Thanks, and hope for your reply.
>>>
>>> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:

 That error is because some characters in your training text are not 
 part of the unicharset of chi_sim.

 You are trying finetune training which will give error. Replace top 
 layer will work.

 I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
 all languages. 

 You can tell us if there are any specific characters missing from 
 existing traineddata .

 ShreeDevi
 
 भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

 On Tue, Jul 25, 2017 at 12:46 PM,  wrote:

> Hello,
>
> I apply the command to train my own traineddata:
>
> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>   --target_error_rate 0.01 
>
> An error appears by Tess4.0 that shown in the following img. The system 
> (Tess4.0) says "Can't encode transcript" for text content such as 
> "化简(-x2)3的结果是...".
> Why? Can you help me?
>
>
> 
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

 -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5651752f-75e9-4d99-a0eb-dce266ad5b3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread robertyoung0511
When I use the new traineddata, it will *report  **an 
 **error : cannot find the 
chi_sim.traineddata. Does the new traineddata only support the Tess4.0 alpa 
release? I use the newest code release.*

在 2017年8月1日星期二 UTC+8下午4:45:07,shree写道:
>
> Ray has uploaded new traineddata files in 
> https://github.com/tesseract-ocr/tessdata/tree/master/best
>
> Why don't you first try recognition with that
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Aug 1, 2017 at 1:45 PM,  
> wrote:
>
>> Hello, Shree:
>>
>> I'm sorry, but whether can I use more than one unicharset, such as 
>> chi_sim and eng and so on, to finetune the training? 
>> Maybe some special characters can be in other unicharsets. If I find 
>> it/them, maybe I will train my traineddata with more unicharsets, and the 
>> special characters will be encoded at that time.
>>
>> Thanks, and hope for your reply.
>>
>> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>>>
>>> That error is because some characters in your training text are not part 
>>> of the unicharset of chi_sim.
>>>
>>> You are trying finetune training which will give error. Replace top 
>>> layer will work.
>>>
>>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
>>> all languages. 
>>>
>>> You can tell us if there are any specific characters missing from 
>>> existing traineddata .
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Tue, Jul 25, 2017 at 12:46 PM,  wrote:
>>>
 Hello,

 I apply the command to train my own traineddata:

 lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --target_error_rate 0.01 

 An error appears by Tess4.0 that shown in the following img. The system 
 (Tess4.0) says "Can't encode transcript" for text content such as 
 "化简(-x2)3的结果是...".
 Why? Can you help me?


 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f5dc5b16-3082-444a-b298-52867ae61e64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread robertyoung0511
OK,I will have a try. Thanks

在 2017年8月1日星期二 UTC+8下午4:45:07,shree写道:
>
> Ray has uploaded new traineddata files in 
> https://github.com/tesseract-ocr/tessdata/tree/master/best
>
> Why don't you first try recognition with that
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Aug 1, 2017 at 1:45 PM,  
> wrote:
>
>> Hello, Shree:
>>
>> I'm sorry, but whether can I use more than one unicharset, such as 
>> chi_sim and eng and so on, to finetune the training? 
>> Maybe some special characters can be in other unicharsets. If I find 
>> it/them, maybe I will train my traineddata with more unicharsets, and the 
>> special characters will be encoded at that time.
>>
>> Thanks, and hope for your reply.
>>
>> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>>>
>>> That error is because some characters in your training text are not part 
>>> of the unicharset of chi_sim.
>>>
>>> You are trying finetune training which will give error. Replace top 
>>> layer will work.
>>>
>>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
>>> all languages. 
>>>
>>> You can tell us if there are any specific characters missing from 
>>> existing traineddata .
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Tue, Jul 25, 2017 at 12:46 PM,  wrote:
>>>
 Hello,

 I apply the command to train my own traineddata:

 lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
   --target_error_rate 0.01 

 An error appears by Tess4.0 that shown in the following img. The system 
 (Tess4.0) says "Can't encode transcript" for text content such as 
 "化简(-x2)3的结果是...".
 Why? Can you help me?


 

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
  
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3dbb845e-f992-47e9-bed4-888e3f623693%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread ShreeDevi Kumar
Ray has uploaded new traineddata files in
https://github.com/tesseract-ocr/tessdata/tree/master/best

Why don't you first try recognition with that

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Aug 1, 2017 at 1:45 PM,  wrote:

> Hello, Shree:
>
> I'm sorry, but whether can I use more than one unicharset, such as chi_sim
> and eng and so on, to finetune the training?
> Maybe some special characters can be in other unicharsets. If I find
> it/them, maybe I will train my traineddata with more unicharsets, and the
> special characters will be encoded at that time.
>
> Thanks, and hope for your reply.
>
> 在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>>
>> That error is because some characters in your training text are not part
>> of the unicharset of chi_sim.
>>
>> You are trying finetune training which will give error. Replace top layer
>> will work.
>>
>> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for
>> all languages.
>>
>> You can tell us if there are any specific characters missing from
>> existing traineddata .
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Tue, Jul 25, 2017 at 12:46 PM,  wrote:
>>
>>> Hello,
>>>
>>> I apply the command to train my own traineddata:
>>>
>>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>>>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>>>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>>   --target_error_rate 0.01
>>>
>>> An error appears by Tess4.0 that shown in the following img. The system 
>>> (Tess4.0) says "Can't encode transcript" for text content such as 
>>> "化简(-x2)3的结果是...".
>>> Why? Can you help me?
>>>
>>>
>>> 
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUKXSiqsVuQenHf%2BCBJ01-XOeGGM8FKNn-G0xH%2B47QCTw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread robertyoung0511
Hello, Shree:

I'm sorry, but whether can I use more than one unicharset, such as chi_sim 
and eng and so on, to finetune the training? 
Maybe some special characters can be in other unicharsets. If I find 
it/them, maybe I will train my traineddata with more unicharsets, and the 
special characters will be encoded at that time.

Thanks, and hope for your reply.

在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>
> That error is because some characters in your training text are not part 
> of the unicharset of chi_sim.
>
> You are trying finetune training which will give error. Replace top layer 
> will work.
>
> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
> all languages. 
>
> You can tell us if there are any specific characters missing from existing 
> traineddata .
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Jul 25, 2017 at 12:46 PM,  
> wrote:
>
>> Hello,
>>
>> I apply the command to train my own traineddata:
>>
>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>   --target_error_rate 0.01 
>>
>> An error appears by Tess4.0 that shown in the following img. The system 
>> (Tess4.0) says "Can't encode transcript" for text content such as 
>> "化简(-x2)3的结果是...".
>> Why? Can you help me?
>>
>>
>> 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2753f88a-ba89-4164-8271-9eb13207736f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-07-25 Thread robertyoung0511
Thanks for helpness.

I will finetune with new traineddata for all languages after 2-3 weeks, and 
give feedback to evaluate the specific characters.

在 2017年7月25日星期二 UTC+8下午3:23:08,shree写道:
>
> That error is because some characters in your training text are not part 
> of the unicharset of chi_sim.
>
> You are trying finetune training which will give error. Replace top layer 
> will work.
>
> I suggest that you wait 2-3 weeks for Ray to upload new traineddata for 
> all languages. 
>
> You can tell us if there are any specific characters missing from existing 
> traineddata .
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, Jul 25, 2017 at 12:46 PM,  
> wrote:
>
>> Hello,
>>
>> I apply the command to train my own traineddata:
>>
>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>>   --target_error_rate 0.01 
>>
>> An error appears by Tess4.0 that shown in the following img. The system 
>> (Tess4.0) says "Can't encode transcript" for text content such as 
>> "化简(-x2)3的结果是...".
>> Why? Can you help me?
>>
>>
>> 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c931a314-6dca-44cb-8b22-dd14703a133f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-07-25 Thread ShreeDevi Kumar
That error is because some characters in your training text are not part of
the unicharset of chi_sim.

You are trying finetune training which will give error. Replace top layer
will work.

I suggest that you wait 2-3 weeks for Ray to upload new traineddata for all
languages.

You can tell us if there are any specific characters missing from existing
traineddata .

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Jul 25, 2017 at 12:46 PM,  wrote:

> Hello,
>
> I apply the command to train my own traineddata:
>
> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \
>   --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \
>   --train_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>   --eval_listfile ~/tesstutorial/chitest/chi.training_files.txt \
>   --target_error_rate 0.01
>
> An error appears by Tess4.0 that shown in the following img. The system 
> (Tess4.0) says "Can't encode transcript" for text content such as 
> "化简(-x2)3的结果是...".
> Why? Can you help me?
>
>
> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/e2e1d749-a55d-4355-b128-5d0fe2181e19%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWjrZ0yNfP%2BTcnKyzn9HO3LxBDsSdU%2BeqVg%2BSD_eacUUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.