[tesseract-ocr] Re: Tesseract training has an upper limit on the use of cpu?Is the more cpu, the faster the training?

2018-12-09 Thread bruce
Hi  Junye,
Now,I hava an workstation  with 36 core(Intel(R) Xeon(R) E7-4820 v2 
2.00GHz) 
32G Memory ,
RHEL7.3 system

My training text is about  *29MB* including *9470568* characters.
The .tif file is about 2.5GB ,file sizes generated by different fonts are 
slightly different. It takes about *12 hours* to generate a tif file.
It takes about *40 hours* to generate one lstm files from a .tif file.

this is my command as follows:
/usr/local/bin/tesseract 
/root/tesseract_train/tif_and_box/lyq_chn.ReejiCloudYuanXiGBK.exp0.tif  
/root/tesseract_train/lstm/aaa/ReejiCloudYuanXiGBK.exp0  
/usr/share/tesseract/4/tessdata/configs/lstm.train 
/usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > 
/root/tesseract_train/lstmlogs/ReejiCloudYuanXiGBK.log  2>&1

/usr/local/bin/tesseract 
/root/tesseract_train/tif_and_box/lyq_chn.MSmartPRC.exp0.tif  
/root/tesseract_train/lstm/aaa/MSmartPRC.exp0  
/usr/share/tesseract/4/tessdata/configs/lstm.train 
/usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > 
/root/tesseract_train/lstmlogs/MSmartPRC.log  2>&1

/usr/local/bin/tesseract 
/root/tesseract_train/tif_and_box/lyq_chn.SimSun.exp0.tif  
/root/tesseract_train/lstm/aaa/SimSun.exp0  
/usr/share/tesseract/4/tessdata/configs/lstm.train 
/usr/share/tesseract/4/tessdata/scripts/lang/lyq_chn/lyq_chn.config > 
/root/tesseract_train/lstmlogs/SimSun.log  2>&1

As shown in the screenshot:
[image: training.png]

*I found that a tesseract  process can only use one core.*

here is the tesseract --version :
[image: 234.png]

*This is too time consuming. Is there no other way to speed up?*

在 2018年11月27日星期二 UTC+8下午5:27:44,Junye Li写道:
>
> I don't think that would be the case unless your training text is few 
> hundred megabytes in size...
>
> I am running Tesseract on Ubuntu 18.04 and based a very quick test it 
> turned out Tesseract on Ubuntu performed better than on Windows in terms of 
> agreement accuracy (I'm training it for handwritings). 
>
> As for the training, it took probably around 5 minutes to complete 2000 
> iterations for me (each training sample is of ~500 English character long). 
>
> Cheers,
> Junye
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/170e0726-c48c-4006-8848-63723d54257e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] What is the information in basetrain.log

2018-12-09 Thread Khosrobeigy.zohreh
I have read these page but I confused about the output of convolution. I
want to know which is the output of convolution?

On Sun, 9 Dec 2018, 9:34 pm Lorenzo Bolzani 
> You can find some details here:
>
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
> https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00
>
>
> Lorenzo
>
>
> Il giorno dom 9 dic 2018 alle ore 18:02 Zohreh Khosrobeygi <
> beigy.zoh...@gmail.com> ha scritto:
>
>> Hi,
>> Does any one know about the information in the log file that create while
>> training?
>> Warning: given outputs 1 not equal to unicharset of 165.
>> Num outputs,weights in Series:
>>   1,48,0,1:1, 0
>> Num outputs,weights in Series:
>>   C3,3:9, 0
>>   Ft16:16, 160
>> Total weights = 160
>>   [C3,3Ft16]:16, 160
>>   Mp3,3:16, 0
>>   Lbys64:128, 41472
>>   Lbx128:256, 263168
>>   Lby256:512, 1050624
>>   Lbx512:1024, 4198400
>>   Fc165:165, 169125
>> Total weights = 5722949
>> Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from
>> request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1]
>> Espacially this part:
>> Num outputs,weights in Series:
>>   1,48,0,1:1, 0
>> Num outputs,weights in Series:
>>   C3,3:9, 0
>>   Ft16:16, 160
>> Total weights = 160
>>   [C3,3Ft16]:16, 160
>>   Mp3,3:16, 0
>>
>> Thanks for your help.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/Zmq-pCgV8XA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy9d0oH2eVSf11sTx2cnG9NOWNp9O5pP67%2BrLrzb2nP1A%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgxRPpTzeubXQkojfnbaCPNsra-e2LbPq_hMQvTS_7y5xw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] What is the information in basetrain.log

2018-12-09 Thread Lorenzo Bolzani
You can find some details here:

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00


Lorenzo


Il giorno dom 9 dic 2018 alle ore 18:02 Zohreh Khosrobeygi <
beigy.zoh...@gmail.com> ha scritto:

> Hi,
> Does any one know about the information in the log file that create while
> training?
> Warning: given outputs 1 not equal to unicharset of 165.
> Num outputs,weights in Series:
>   1,48,0,1:1, 0
> Num outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>   Lbys64:128, 41472
>   Lbx128:256, 263168
>   Lby256:512, 1050624
>   Lbx512:1024, 4198400
>   Fc165:165, 169125
> Total weights = 5722949
> Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from
> request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1]
> Espacially this part:
> Num outputs,weights in Series:
>   1,48,0,1:1, 0
> Num outputs,weights in Series:
>   C3,3:9, 0
>   Ft16:16, 160
> Total weights = 160
>   [C3,3Ft16]:16, 160
>   Mp3,3:16, 0
>
> Thanks for your help.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy9d0oH2eVSf11sTx2cnG9NOWNp9O5pP67%2BrLrzb2nP1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] What is the information in basetrain.log

2018-12-09 Thread Zohreh Khosrobeygi
Hi, 
Does any one know about the information in the log file that create while 
training?
Warning: given outputs 1 not equal to unicharset of 165.
Num outputs,weights in Series:
  1,48,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lbys64:128, 41472
  Lbx128:256, 263168
  Lby256:512, 1050624
  Lbx512:1024, 4198400
  Fc165:165, 169125
Total weights = 5722949
Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from 
request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1]
Espacially this part:
Num outputs,weights in Series:
  1,48,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0

Thanks for your help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] CNN and Tensorflow in Tesseract

2018-12-09 Thread Zohreh Khosrobeygi
I have some question:
1- how many layers does CNN has in tensor flow?
2- What is the stride in the Convolution layers and pooling layers?
3- Convolution has use zero pad?

I'm training Persian language and my accuracy is so good but I need to 
increase. Convolution is so important in my training.
Does any one know the answers?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ab23f73b-8fb5-44e7-9872-5b79eba03a54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.