Re: [tesseract-ocr] What is the information in basetrain.log

2018-12-09 Thread Khosrobeigy.zohreh
I have read these page but I confused about the output of convolution. I
want to know which is the output of convolution?

On Sun, 9 Dec 2018, 9:34 pm Lorenzo Bolzani 
> You can find some details here:
>
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
> https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00
>
>
> Lorenzo
>
>
> Il giorno dom 9 dic 2018 alle ore 18:02 Zohreh Khosrobeygi <
> beigy.zoh...@gmail.com> ha scritto:
>
>> Hi,
>> Does any one know about the information in the log file that create while
>> training?
>> Warning: given outputs 1 not equal to unicharset of 165.
>> Num outputs,weights in Series:
>>   1,48,0,1:1, 0
>> Num outputs,weights in Series:
>>   C3,3:9, 0
>>   Ft16:16, 160
>> Total weights = 160
>>   [C3,3Ft16]:16, 160
>>   Mp3,3:16, 0
>>   Lbys64:128, 41472
>>   Lbx128:256, 263168
>>   Lby256:512, 1050624
>>   Lbx512:1024, 4198400
>>   Fc165:165, 169125
>> Total weights = 5722949
>> Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from
>> request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1]
>> Espacially this part:
>> Num outputs,weights in Series:
>>   1,48,0,1:1, 0
>> Num outputs,weights in Series:
>>   C3,3:9, 0
>>   Ft16:16, 160
>> Total weights = 160
>>   [C3,3Ft16]:16, 160
>>   Mp3,3:16, 0
>>
>> Thanks for your help.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/Zmq-pCgV8XA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLy9d0oH2eVSf11sTx2cnG9NOWNp9O5pP67%2BrLrzb2nP1A%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgxRPpTzeubXQkojfnbaCPNsra-e2LbPq_hMQvTS_7y5xw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] the length of input to lstm

2018-12-04 Thread Khosrobeigy.zohreh
I've Read these page several times. But it doesn't have any thing about the
output of tensorflow or input of lstm.
*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Wed, Dec 5, 2018 at 12:32 AM Shree Devi Kumar 
wrote:

> See
>
> https://github.com/tesseract-ocr/tesseract/wiki/VGSLSpecs
>
> https://github.com/tesseract-ocr/tesseract/wiki/NeuralNetsInTesseract4.00
>
> On Tue, 4 Dec 2018, 15:14 Zohreh Khosrobeygi  wrote:
>
>> I'm training tesseract from scratch for the Persian Language. But I need
>> to know about the output of TF convention because it is the input of lstm.
>> Wiki says, for example, ct5,5,32. I couldn't understand the number of
>> output. In this case,32 is depth. but how about the number of output.
>> Besides, somewhere say, 32 is the number of the filter. Can anyone describe
>> it to me?
>> In summary, when the network is:
>> [1,48,0,1[C3,3Ft16]Mp3,3Lfys64Lfx96Lrx96Lfx192Fc165]
>> what are the numbers of inputs?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/badf97f0-1420-43ce-9879-7ccc3ab79a05%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/_KxhLFrs1x8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUDu_UEnOpWXyEB1c2Ha4yMn7k3kYwvj2cDP5AYfd%2BjVQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgwtaCyM_Zm-KYe0bdNh%3DksEBm53C28z54suZdV8LYhHzg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-28 Thread Khosrobeigy.zohreh
I understood the problem.
 When I generated data I copy them to another place.
when I use the copy files, I got the error but when I use data which
doesn't copy I have no problem.
I want to say thanks for your help
*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Tue, Nov 27, 2018 at 6:55 PM Shree Devi Kumar 
wrote:

> You make a good point, zdenko.
>
> If there are limitations on training data to be used or minimum memory
> requirements for handling such data for doing custom training, it will be
> good to document them in the wiki, so that people do not waste time and
> effort in training if they don't have the minimum hardware requirements.
>
> On Tue, 27 Nov 2018, 08:49 Zdenko Podobny 
>> Yes, you can ;-)
>> If you want to document it, you need to find reason for error.
>> If you want to find reason you need to dive in 130Gb of input data...
>> Enjoy.
>>
>> IMO right suggestion is to ask user to find file/data that cause problem
>> and create minimal input data that demonstrate problem. Creating issue
>> without testing case (for reproducing problem) is useless and demotivating.
>>
>> Zdenko
>>
>>
>> ut 27. 11. 2018 o 13:23 Shree Devi Kumar 
>> napísal(a):
>>
>>> In my opinion, the assert still needs to be documented as an issue, with
>>> LSTM training.
>>>
>>> On Tue, 27 Nov 2018, 05:03 Zdenko Podobny >>
>>>> Shree,
>>>>
>>>> issue tracker is not for custom training. Simply because there is not
>>>> enough people and
>>>> it can not be reproduced...
>>>> Did you read:  "I have been runnig about 130G data which are 4000
>>>> files"?
>>>> Unless you are not able to reproduce problem with very small data,
>>>> there is IMO nobody would be willing to look at issue.
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> po 26. 11. 2018 o 23:38 Shree Devi Kumar 
>>>> napísal(a):
>>>>
>>>>> If you have the problem with the master version also, please open an
>>>>> issue on github.
>>>>>
>>>>> Please include a stack trace/debug information also.
>>>>>
>>>>> On Mon, 26 Nov 2018, 10:48 Khosrobeigy.zohreh >>>> wrote:
>>>>>
>>>>>> And I have the problem again
>>>>>> *Kind regards,*
>>>>>> *Zohreh Khosrobeygi*
>>>>>>
>>>>>> *Student of IT*
>>>>>>
>>>>>> *University of Tehran, 2016*
>>>>>>
>>>>>> *Phone: (+98)9196042887*
>>>>>>
>>>>>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 26, 2018 at 6:45 PM Khosrobeigy.zohreh <
>>>>>> beigy.zoh...@gmail.com> wrote:
>>>>>>
>>>>>>> I downloaded tesseract-master from github and reinstall it again but
>>>>>>> now my version is:
>>>>>>> tesseract 4.0.0
>>>>>>>  leptonica-1.76.0
>>>>>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>>>>>> zlib 1.2.8
>>>>>>>  Found AVX2
>>>>>>>  Found AVX
>>>>>>>  Found SSE
>>>>>>> Is that  true?
>>>>>>>
>>>>>>> *Kind regards,*
>>>>>>> *Zohreh Khosrobeygi*
>>>>>>>
>>>>>>> *Student of IT*
>>>>>>>
>>>>>>> *University of Tehran, 2016*
>>>>>>>
>>>>>>> *Phone: (+98)9196042887*
>>>>>>>
>>>>>>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar <
>>>>>>> shreesh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Please update to the latest version from github and try.
>>>>>>>>
>>>>>>>> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh <
>>>>>>>> beigy.zoh...@gmail.com wrote:
>>>>>>>>
>>>>>>>>> tesseract 4.0.0-beta.4
>>>>>>>>>  leptonica-1.76.0
>>>>

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
And I have the problem again
*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 6:45 PM Khosrobeigy.zohreh 
wrote:

> I downloaded tesseract-master from github and reinstall it again but now
> my version is:
> tesseract 4.0.0
>  leptonica-1.76.0
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>  Found AVX2
>  Found AVX
>  Found SSE
> Is that  true?
>
> *Kind regards,*
> *Zohreh Khosrobeygi*
>
> *Student of IT*
>
> *University of Tehran, 2016*
>
> *Phone: (+98)9196042887*
>
> *Email:khosrobeygi.zo...@ut.ac.ir *
>
>
>
> On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar 
> wrote:
>
>> Please update to the latest version from github and try.
>>
>> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh > wrote:
>>
>>> tesseract 4.0.0-beta.4
>>>  leptonica-1.76.0
>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>> zlib 1.2.8
>>>  Found AVX2
>>>  Found AVX
>>>  Found SSE
>>>
>>> *Kind regards,*
>>> *Zohreh Khosrobeygi*
>>>
>>> *Student of IT*
>>>
>>> *University of Tehran, 2016*
>>>
>>> *Phone: (+98)9196042887*
>>>
>>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>>
>>>
>>>
>>> On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
>>> wrote:
>>>
>>>> What is the version of tesseract?
>>>>
>>>>
>>>> tesseract -v
>>>>
>>>> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi >>> wrote:
>>>>
>>>>> Hi,
>>>>> I have been runnig about 130G data which are 4000 files. My command is
>>>>>
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>>>>   --traineddata
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>>>>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>>>>   --model_output
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>>>>> --learning_rate 0.001 \
>>>>>   --train_listfile
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>>>>> \
>>>>>   --eval_listfile
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>>>>> \ --max_iterations 15
>>>>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>>>>
>>>>> but after reading some files the tesseract gives the error and stop
>>>>> training:
>>>>>
>>>>> Loaded 821/10179 pages (1-821) of document
>>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>>>>> lt-lstmtraining: genericvector.h:720: T&
>>>>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>>>>> >= 0 && index < size_used_' failed.
>>>>> Could you please help me?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com?utm_medium=email_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "tesseract-ocr" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe
>>>> .

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
I downloaded tesseract-master from github and reinstall it again but now my
version is:
tesseract 4.0.0
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
1.2.8
 Found AVX2
 Found AVX
 Found SSE
Is that  true?

*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar 
wrote:

> Please update to the latest version from github and try.
>
> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh  wrote:
>
>> tesseract 4.0.0-beta.4
>>  leptonica-1.76.0
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
>> 1.2.8
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>>
>> *Kind regards,*
>> *Zohreh Khosrobeygi*
>>
>> *Student of IT*
>>
>> *University of Tehran, 2016*
>>
>> *Phone: (+98)9196042887*
>>
>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>
>>
>>
>> On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
>> wrote:
>>
>>> What is the version of tesseract?
>>>
>>>
>>> tesseract -v
>>>
>>> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi >> wrote:
>>>
>>>> Hi,
>>>> I have been runnig about 130G data which are 4000 files. My command is
>>>>
>>>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>>>   --traineddata
>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>>>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>>>   --model_output
>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>>>> --learning_rate 0.001 \
>>>>   --train_listfile
>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>>>> \
>>>>   --eval_listfile
>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>>>> \ --max_iterations 15
>>>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>>>
>>>> but after reading some files the tesseract gives the error and stop
>>>> training:
>>>>
>>>> Loaded 821/10179 pages (1-821) of document
>>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>>>> lt-lstmtraining: genericvector.h:720: T&
>>>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>>>> >= 0 && index < size_used_' failed.
>>>> Could you please help me?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com?utm_medium=email_source=footer>
>>> .
>>> For more options, visit https://groups.google.co

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
tesseract 4.0.0-beta.4
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
1.2.8
 Found AVX2
 Found AVX
 Found SSE

*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
wrote:

> What is the version of tesseract?
>
>
> tesseract -v
>
> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi  wrote:
>
>> Hi,
>> I have been runnig about 130G data which are 4000 files. My command is
>>
>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>   --traineddata
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>   --model_output
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>> --learning_rate 0.001 \
>>   --train_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>> \
>>   --eval_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>> \ --max_iterations 15
>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>
>> but after reading some files the tesseract gives the error and stop
>> training:
>>
>> Loaded 821/10179 pages (1-821) of document
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>> lt-lstmtraining: genericvector.h:720: T&
>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>> >= 0 && index < size_used_' failed.
>> Could you please help me?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgyK6gbKMm-wYZGYSObJ5KOrL%2BCroiWdbUd6rshk8tvqqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Compute CTC targets failed while training

2018-09-26 Thread Khosrobeigy.zohreh
No, I always train from scratch.
best fast.traindata doesn't recognize eng and persian and the accuracy is
too low in some fonts.
I want to solve this problem.
For fine tune can have different unicharset. As I read in wiki of
tesseract, it is the number of class of lstm. So if Mr. Smit has trained
for example 120 unicharset, can i have 160 unicharset in fine tune?
As I know the number of class in lstm cannot change.
all character in eng and fas and punc are aroud 164 character.

On Wed, Sep 26, 2018 at 12:34 PM Shree Devi Kumar 
wrote:

>
> >By version alpha, I trained about 1000 line and it is not so bad
>
> You must have only done fine tuning of model then and now you are trying
> to train from scratch.
>
> On Wed, 26 Sep 2018, 04:01 Khosrobeigy.zohreh, 
> wrote:
>
>> I know, actually I am master in lstm. I want to resolve all error and
>> then train big text.
>> By version alpha, I trained about 1000 line and it is not so bad. But in
>> version beta 4 I got many error.
>> In alpha,
>> # Use LSTM
>> tessedit_ocr_engine_mode 1
>> tessedit_pageseg_mode 6
>>
>> # Arabic page layout variables
>> segment_nonalphabetic_script 1
>>
>> # Avoid dropping rows
>> textord_noise_rowratio 20.0
>> textord_noise_syfract 0.6
>>
>> textord_min_linesize 2.5
>>
>> # Avoid over-estimating intra-word spacing at both row and
>> # block levels when using old to method
>> tosp_old_to_method T
>> tosp_old_to_constrain_sp_kn T
>> tosp_old_sp_kn_th_factor 4.0
>>
>> tosp_only_small_gaps_for_kern T
>> tosp_use_pre_chopping T
>>  I used all these, but now my model doesn't learn.
>> Has any thing changed in beta 4 for example text2image?
>>
>> On Wed, Sep 26, 2018 at 12:53 AM Shree Devi Kumar 
>> wrote:
>>
>>>   --fontlist "Arial"
>>>
>>> Does that have good coverage for Farsi?
>>>
>>>
>>> --max_iterations 5000
>>>
>>> You are trying to train from scratch with 18000 lines of text and only
>>> 5000 iterations. That will not work.
>>>
>>> Ray has trained on hundreds of thousands of lines of text and millions
>>> of iterations.
>>>
>>> On Tue, 25 Sep 2018, 16:20 Zohreh Khosrobeygi, 
>>> wrote:
>>>
>>>> Hi, I use this :
>>>> tesseract 4.0.0-beta.4
>>>>  leptonica-1.74.4
>>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>>> zlib 1.2.8
>>>>
>>>>  Found AVX2
>>>>  Found AVX
>>>>  Found SSE
>>>> I've trained about 18000 line for persian language. I use this command:
>>>>
>>>> bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
>>>> --training_text
>>>>  
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt
>>>> --wordlist
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt
>>>> --linedata_only \
>>>>   --noextract_font_properties --langdata_dir
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata \
>>>>   --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
>>>>   --fontlist "Arial" --output_dir
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
>>>> and then run this:
>>>> sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining   \
>>>>   --traineddata
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata
>>>>  --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
>>>>   --model_output
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base
>>>> --learning_rate 0.001 \
>>>>   --train_listfile
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt
>>>> \
>>>>   --eval_listfile
>>>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>>>> \
>>>>   --max_iterations 5000
>>>> &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
>>>> but always show Compute CTC targets failed and the model is not well at
>>>> all.
>>>> I normal my text and each line of the text have 20 token(max).
>>>> Could you pleas help me?
>>>>
>>>>
>>>> --
>>>> You received this 

Re: [tesseract-ocr] Compute CTC targets failed while training

2018-09-26 Thread Khosrobeigy.zohreh
I know, actually I am master in lstm. I want to resolve all error and then
train big text.
By version alpha, I trained about 1000 line and it is not so bad. But in
version beta 4 I got many error.
In alpha,
# Use LSTM
tessedit_ocr_engine_mode 1
tessedit_pageseg_mode 6

# Arabic page layout variables
segment_nonalphabetic_script 1

# Avoid dropping rows
textord_noise_rowratio 20.0
textord_noise_syfract 0.6

textord_min_linesize 2.5

# Avoid over-estimating intra-word spacing at both row and
# block levels when using old to method
tosp_old_to_method T
tosp_old_to_constrain_sp_kn T
tosp_old_sp_kn_th_factor 4.0

tosp_only_small_gaps_for_kern T
tosp_use_pre_chopping T
 I used all these, but now my model doesn't learn.
Has any thing changed in beta 4 for example text2image?

On Wed, Sep 26, 2018 at 12:53 AM Shree Devi Kumar 
wrote:

>   --fontlist "Arial"
>
> Does that have good coverage for Farsi?
>
>
> --max_iterations 5000
>
> You are trying to train from scratch with 18000 lines of text and only
> 5000 iterations. That will not work.
>
> Ray has trained on hundreds of thousands of lines of text and millions of
> iterations.
>
> On Tue, 25 Sep 2018, 16:20 Zohreh Khosrobeygi, 
> wrote:
>
>> Hi, I use this :
>> tesseract 4.0.0-beta.4
>>  leptonica-1.74.4
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
>> 1.2.8
>>
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>> I've trained about 18000 line for persian language. I use this command:
>>
>> bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
>> --training_text
>>  
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt
>> --wordlist
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt
>> --linedata_only \
>>   --noextract_font_properties --langdata_dir
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata \
>>   --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
>>   --fontlist "Arial" --output_dir
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
>> and then run this:
>> sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining   \
>>   --traineddata
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata
>>  --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
>>   --model_output
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base
>> --learning_rate 0.001 \
>>   --train_listfile
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt
>> \
>>   --eval_listfile
>> /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>> \
>>   --max_iterations 5000
>> &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
>> but always show Compute CTC targets failed and the model is not well at
>> all.
>> I normal my text and each line of the text have 20 token(max).
>> Could you pleas help me?
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/hGQMuZip6io/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUcjmoC%2BfvY5qvn3e4PBVMhBFiEGDGP9WCkEUnsygQTpw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to 

Re: [tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Khosrobeigy.zohreh
I did it and I will install again

On Tue, Sep 25, 2018 at 4:46 PM Khosrobeigy.zohreh 
wrote:

> Could please tell me how i can remove all tesseract and I after that i
> install again.
>
> On Tue, Sep 25, 2018 at 4:42 PM Zdenko Podobny  wrote:
>
>> I guess you have another installation of tesseract present in your system.
>> Please uninstall old version/other tesseract before installing new
>> version...
>>
>> Zdenko
>>
>>
>> ut 25. 9. 2018 o 15:03 Khosrobeigy.zohreh 
>> napísal(a):
>>
>>> Today, I installed new version of tesseract. Iused this line:
>>>
>>> https://bingrao.github.io/blog/post/2017/07/16/Install-Tesseract-4.0-in-ubuntun-16.04.html
>>> and also :
>>> cd leptonica 17.4
>>> ./configure
>>> sudo make
>>> sudo make install
>>> sudo apt-get install g++ (sudo -f install)
>>> sudo apt-get install autoconf automake libtool
>>> sudo apt-get install autoconf-archive
>>> sudo apt-get install pkg-config
>>> sudo apt-get install libpng-dev
>>> sudo apt-get install libjpeg8-dev
>>> sudo apt-get install libtiff5-dev
>>> sudo apt-get install zlib1g-dev
>>> tools for train
>>> sudo apt-get install libicu-dev
>>> sudo apt-get install libpango1.0-dev
>>> sudo apt-get install libcairo2-dev
>>>
>>> install tesseract
>>> ./autogen.sh
>>> ./configure
>>> sudo make
>>> sudo make install
>>> sudo ldconfig
>>> sudo make training
>>> sudo make training-install
>>>
>>>
>>> On Tue, Sep 25, 2018 at 4:25 PM Shree Devi Kumar 
>>> wrote:
>>>
>>>> Your installation of tesseract and training tools has some problem.
>>>>
>>>> How did you build or install tesseract?
>>>>
>>>> On Tue, 25 Sep 2018, 08:52 Khosrobeigy.zohreh, 
>>>> wrote:
>>>>
>>>>> zohreh@zohreh-TP301UJ:~$ text2image -v
>>>>> text2image: symbol lookup error: text2image: undefined symbol:
>>>>> _Z16tprintf_internalPKcz
>>>>>  It shows same error.
>>>>>
>>>>> On Tue, Sep 25, 2018 at 4:20 PM Shree Devi Kumar 
>>>>> wrote:
>>>>>
>>>>>> What's the output for?
>>>>>>
>>>>>> which text2image
>>>>>>
>>>>>> text2image -v
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 25 Sep 2018, 08:39 Khosrobeigy.zohreh, <
>>>>>> beigy.zoh...@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, actually the main problem is this error:
>>>>>>> /usr/local/bin/text2image: symbol lookup error:
>>>>>>> /usr/local/bin/text2image: undefined symbol: _Z16tprintf_internalPKcz
>>>>>>> I have this error. How can I  solve this error?
>>>>>>>
>>>>>>> On Tue, Sep 25, 2018 at 3:41 PM Shree Devi Kumar <
>>>>>>> shreesh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Are the fonts in  /usr/share/fonts ?
>>>>>>>>
>>>>>>>> Reduce the
>>>>>>>> --min_coverage 1
>>>>>>>>
>>>>>>>> to .99 and see if some fonts are found.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 25 Sep 2018, 07:50 Zohreh Khosrobeygi, <
>>>>>>>> beigy.zoh...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I use
>>>>>>>>>  tesseract 4.0.0-beta.4
>>>>>>>>>  leptonica-1.74.4
>>>>>>>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6
>>>>>>>>> : zlib 1.2.8
>>>>>>>>>
>>>>>>>>>  Found AVX2
>>>>>>>>>  Found AVX
>>>>>>>>>  Found SSE
>>>>>>>>> But when I run this command:
>>>>>>>>>  text2image --find_fonts \
>>>>>>>>> --fonts_dir /usr/share/fonts \
>>>>>>>>> --text ./langdata/fas/fas.training_text \
>>>>>>>>> --min_coverage 1  \
>>>>>>>>> --outputbase ./langdata/fas/fas \
>&g

Re: [tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Khosrobeigy.zohreh
Could please tell me how i can remove all tesseract and I after that i
install again.

On Tue, Sep 25, 2018 at 4:42 PM Zdenko Podobny  wrote:

> I guess you have another installation of tesseract present in your system.
> Please uninstall old version/other tesseract before installing new
> version...
>
> Zdenko
>
>
> ut 25. 9. 2018 o 15:03 Khosrobeigy.zohreh 
> napísal(a):
>
>> Today, I installed new version of tesseract. Iused this line:
>>
>> https://bingrao.github.io/blog/post/2017/07/16/Install-Tesseract-4.0-in-ubuntun-16.04.html
>> and also :
>> cd leptonica 17.4
>> ./configure
>> sudo make
>> sudo make install
>> sudo apt-get install g++ (sudo -f install)
>> sudo apt-get install autoconf automake libtool
>> sudo apt-get install autoconf-archive
>> sudo apt-get install pkg-config
>> sudo apt-get install libpng-dev
>> sudo apt-get install libjpeg8-dev
>> sudo apt-get install libtiff5-dev
>> sudo apt-get install zlib1g-dev
>> tools for train
>> sudo apt-get install libicu-dev
>> sudo apt-get install libpango1.0-dev
>> sudo apt-get install libcairo2-dev
>>
>> install tesseract
>> ./autogen.sh
>> ./configure
>> sudo make
>> sudo make install
>> sudo ldconfig
>> sudo make training
>> sudo make training-install
>>
>>
>> On Tue, Sep 25, 2018 at 4:25 PM Shree Devi Kumar 
>> wrote:
>>
>>> Your installation of tesseract and training tools has some problem.
>>>
>>> How did you build or install tesseract?
>>>
>>> On Tue, 25 Sep 2018, 08:52 Khosrobeigy.zohreh, 
>>> wrote:
>>>
>>>> zohreh@zohreh-TP301UJ:~$ text2image -v
>>>> text2image: symbol lookup error: text2image: undefined symbol:
>>>> _Z16tprintf_internalPKcz
>>>>  It shows same error.
>>>>
>>>> On Tue, Sep 25, 2018 at 4:20 PM Shree Devi Kumar 
>>>> wrote:
>>>>
>>>>> What's the output for?
>>>>>
>>>>> which text2image
>>>>>
>>>>> text2image -v
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 25 Sep 2018, 08:39 Khosrobeigy.zohreh, 
>>>>> wrote:
>>>>>
>>>>>> Yes, actually the main problem is this error:
>>>>>> /usr/local/bin/text2image: symbol lookup error:
>>>>>> /usr/local/bin/text2image: undefined symbol: _Z16tprintf_internalPKcz
>>>>>> I have this error. How can I  solve this error?
>>>>>>
>>>>>> On Tue, Sep 25, 2018 at 3:41 PM Shree Devi Kumar <
>>>>>> shreesh...@gmail.com> wrote:
>>>>>>
>>>>>>> Are the fonts in  /usr/share/fonts ?
>>>>>>>
>>>>>>> Reduce the
>>>>>>> --min_coverage 1
>>>>>>>
>>>>>>> to .99 and see if some fonts are found.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 25 Sep 2018, 07:50 Zohreh Khosrobeygi, <
>>>>>>> beigy.zoh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I use
>>>>>>>>  tesseract 4.0.0-beta.4
>>>>>>>>  leptonica-1.74.4
>>>>>>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6
>>>>>>>> : zlib 1.2.8
>>>>>>>>
>>>>>>>>  Found AVX2
>>>>>>>>  Found AVX
>>>>>>>>  Found SSE
>>>>>>>> But when I run this command:
>>>>>>>>  text2image --find_fonts \
>>>>>>>> --fonts_dir /usr/share/fonts \
>>>>>>>> --text ./langdata/fas/fas.training_text \
>>>>>>>> --min_coverage 1  \
>>>>>>>> --outputbase ./langdata/fas/fas \
>>>>>>>> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
>>>>>>>> >./langdata/fas/fas.fontslist.txt
>>>>>>>> fas.fontslist.txt is empty. I have some fonts on my linux.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-o

Re: [tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Khosrobeigy.zohreh
zohreh@zohreh-TP301UJ:~$ text2image -v
text2image: symbol lookup error: text2image: undefined symbol:
_Z16tprintf_internalPKcz
 It shows same error.

On Tue, Sep 25, 2018 at 4:20 PM Shree Devi Kumar 
wrote:

> What's the output for?
>
> which text2image
>
> text2image -v
>
>
>
>
> On Tue, 25 Sep 2018, 08:39 Khosrobeigy.zohreh, 
> wrote:
>
>> Yes, actually the main problem is this error:
>> /usr/local/bin/text2image: symbol lookup error:
>> /usr/local/bin/text2image: undefined symbol: _Z16tprintf_internalPKcz
>> I have this error. How can I  solve this error?
>>
>> On Tue, Sep 25, 2018 at 3:41 PM Shree Devi Kumar 
>> wrote:
>>
>>> Are the fonts in  /usr/share/fonts ?
>>>
>>> Reduce the
>>> --min_coverage 1
>>>
>>> to .99 and see if some fonts are found.
>>>
>>>
>>>
>>> On Tue, 25 Sep 2018, 07:50 Zohreh Khosrobeygi, 
>>> wrote:
>>>
>>>> Hi,
>>>> I use
>>>>  tesseract 4.0.0-beta.4
>>>>  leptonica-1.74.4
>>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>>> zlib 1.2.8
>>>>
>>>>  Found AVX2
>>>>  Found AVX
>>>>  Found SSE
>>>> But when I run this command:
>>>>  text2image --find_fonts \
>>>> --fonts_dir /usr/share/fonts \
>>>> --text ./langdata/fas/fas.training_text \
>>>> --min_coverage 1  \
>>>> --outputbase ./langdata/fas/fas \
>>>> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
>>>> >./langdata/fas/fas.fontslist.txt
>>>> fas.fontslist.txt is empty. I have some fonts on my linux.
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/af4239bf-2cad-405f-ba22-540b65dd7596%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/af4239bf-2cad-405f-ba22-540b65dd7596%40googlegroups.com?utm_medium=email_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/_N-t_f4xAbY/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUXxa4JforbbVpVU1QJPyQ3B7unKz3L07Jsc%2BrcZGxe6Q%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUXxa4JforbbVpVU1QJPyQ3B7unKz3L07Jsc%2BrcZGxe6Q%40mail.gmail.com?utm_medium=email_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>> Zohreh Khosrobeygi
>> University of Tehran, 2016
>> Tel: +989196042887
>> khosrobeygi.zo...@ut.ac.ir 
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgzAawwzyTgO8q4OSHHVJU3TJsLTq2hL2XY%3Dz5n5-gB3Cw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgzAawwzyTgO8q4OSHHVJU3TJsLTq2hL2XY%3Dz5n5-gB3Cw%40mail.gmail.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this me

Re: [tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Khosrobeigy.zohreh
Yes, actually the main problem is this error:
/usr/local/bin/text2image: symbol lookup error: /usr/local/bin/text2image:
undefined symbol: _Z16tprintf_internalPKcz
I have this error. How can I  solve this error?

On Tue, Sep 25, 2018 at 3:41 PM Shree Devi Kumar 
wrote:

> Are the fonts in  /usr/share/fonts ?
>
> Reduce the
> --min_coverage 1
>
> to .99 and see if some fonts are found.
>
>
>
> On Tue, 25 Sep 2018, 07:50 Zohreh Khosrobeygi, 
> wrote:
>
>> Hi,
>> I use
>>  tesseract 4.0.0-beta.4
>>  leptonica-1.74.4
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
>> 1.2.8
>>
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>> But when I run this command:
>>  text2image --find_fonts \
>> --fonts_dir /usr/share/fonts \
>> --text ./langdata/fas/fas.training_text \
>> --min_coverage 1  \
>> --outputbase ./langdata/fas/fas \
>> |& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/'
>> >./langdata/fas/fas.fontslist.txt
>> fas.fontslist.txt is empty. I have some fonts on my linux.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/af4239bf-2cad-405f-ba22-540b65dd7596%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/_N-t_f4xAbY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUXxa4JforbbVpVU1QJPyQ3B7unKz3L07Jsc%2BrcZGxe6Q%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgzAawwzyTgO8q4OSHHVJU3TJsLTq2hL2XY%3Dz5n5-gB3Cw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Make lstm for some files

2018-08-19 Thread Khosrobeigy.zohreh
Hi, when I run tesstrain.sh I get this error:
+ err_exit '/tmp/tmp.N31LQSCg1a/fas/fas.Times_New_Roman.exp0.lstmf does not
exist or is not readable'
+ echo -e 'ERROR: /tmp/tmp.N31LQSCg1a/fas/fas.Times_New_Roman.exp0.lstmf'
does not exist or is not readable
+ tee -a /tmp/tmp.N31LQSCg1a/fas/tesstrain.log
ERROR: /tmp/tmp.N31LQSCg1a/fas/fas.Times_New_Roman.exp0.lstmf does not
exist or is not readable
+ exit 1

Tesseract -v:
tesseract 4.0.0-beta.1
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
1.2.8

 Found AVX2
 Found AVX
 Found SSE





On Thu, Aug 16, 2018 at 6:28 PM Shree Devi Kumar 
wrote:

> You need to make lstmf file for each of these.
>
> eg.  tesseract  fas.B_Mitra.exp0.tif  fas.B_Mitra.exp0 --psm 6 lstm.train
>
> will create  fas.B_Mitra.exp0.lstmf
>
>
>
> On Thu, Aug 16, 2018 at 5:40 PM, Zohreh Khosrobeygi <
> beigy.zoh...@gmail.com> wrote:
>
>> I have some tif and box files for each font for example:
>> fas.B_Mitra.exp0.box
>> fas.B_Mitra.exp0.tif
>> fas.B_Mitra.exp1.box
>> fas.B_Mitra.exp1.tif
>> fas.B_Mitra.exp2.box
>> fas.B_Mitra.exp2.tif
>> .
>> .
>> .
>> How can I make lstm for each of them?
>> Thx.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/c011d8f3-75b1-471f-a772-35327390bf78%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/QpAIHg4SPME/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduW4aFq-XA0N8UpiWUiL1HDaUbttK%3D%2Bkp%2Bf69UB8bVngng%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgwTFcbwffgFnkVZVvSB3RavFJs213%2BbZ-xFXhpQ06i7Yw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread Khosrobeigy.zohreh
ok, but I have some tif and box files for each font for example:
fas.B_Mitra.exp0.box
fas.B_Mitra.exp0.tif
fas.B_Mitra.exp1.box
fas.B_Mitra.exp1.tif
fas.B_Mitra.exp2.box
fas.B_Mitra.exp2.tif
.
.
.
How can I make lstm for each of them?



On Tue, Aug 14, 2018 at 4:56 PM,  wrote:

> I mean put all the file path in this file, then running the lstmtraining
> # cat eng.training_files.txt
> /home/tess-ocr/model_output/test//eng.Arial.exp0.lstmf
> /home/tess-ocr/model_output/test//eng.Microsoft_YaHei.exp0.lstmf
> /home/tess-ocr/model_output/test//eng.Times_New_Roman.exp0.lstmf
>
>
> 在 2018年8月14日星期二 UTC+8下午6:04:48,Zohreh Khosrobeygi写道:
>>
>> Sorry, I couldn't understand.
>> Could you please explain more this "and then put all the lstm files
>> together in training_files.txt"
>>
>> On Tue, Aug 14, 2018 at 1:19 PM,  wrote:
>>
>>> you should use tessearct command for each of your box/tif pair
>>> tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
>>> and then put all the lstm files together in training_files.txt
>>>
>>> 在 2018年8月13日星期一 UTC+8下午6:16:09,Zohreh Khosrobeygi写道:

 Hi,
 I have been training persian language. My text is too large so I had to
 generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset
 for all 18 files. Now when I want to make lstm file, it just create one
 lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18.
 I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and
 use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use
 another.
 How can I make a lstm for all my boxes?
 Thx.

>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>> pic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%40goo
>>> glegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Zohreh Khosrobeygi
>> University of Tehran, 2016
>> Tel: +989196042887
>> khosrobe...@ut.ac.ir
>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/67f4fb37-b3d2-4d11-83ff-d83607c48966%
> 40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgz5Y6JxrWSBi5ODSbK0cmphFAwro7qW02b0-n_AujKdQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: LSTM files

2018-08-14 Thread Khosrobeigy.zohreh
Sorry, I couldn't understand.
Could you please explain more this "and then put all the lstm files
together in training_files.txt"

On Tue, Aug 14, 2018 at 1:19 PM,  wrote:

> you should use tessearct command for each of your box/tif pair
> tesseract ${dir}/lang.font.exp0.tif ${dir}/lang.font.exp0 lstm.train
> and then put all the lstm files together in training_files.txt
>
> 在 2018年8月13日星期一 UTC+8下午6:16:09,Zohreh Khosrobeygi写道:
>>
>> Hi,
>> I have been training persian language. My text is too large so I had to
>> generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset
>> for all 18 files. Now when I want to make lstm file, it just create one
>> lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18.
>> I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and
>> use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use
>> another.
>> How can I make a lstm for all my boxes?
>> Thx.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/928-Wfn5rGs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/97a24af8-fb96-402a-a15b-1e6a7df405ca%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgz6kOnb7LO5J9ZbZ9zdbH40a%2BQnVm-_T37nTLr-b_OBtA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] parameter not found: tessedit_ocr_psm_mode

2018-07-01 Thread Khosrobeigy.zohreh
Thanx. It worked

On Sun, Jul 1, 2018 at 8:10 PM, Shree Devi Kumar 
wrote:

> correct variable is
>
> tessedit_pageseg_mode
>
> On Sun, Jul 1, 2018 at 8:51 PM Shree Devi Kumar 
> wrote:
>
>> what's the output for ?
>>
>> tesseract -v
>>
>> which tesseract
>>
>> which tesstrain.sh
>>
>> On Sun, Jul 1, 2018 at 8:39 PM Zohreh Khosrobeygi 
>> wrote:
>>
>>> Hi,
>>> when i use the tesstrain.sh, I have been getting this error that is
>>> about my fas.config. My config file is:
>>>
>>> tessedit_ocr_engine_mode 1
>>> tessedit_ocr_psm_mode 6
>>>
>>> The erroe is:
>>>
>>> read_params_file: parameter not found: tessedit_ocr_psm_mode
>>> + [[ 0 -gt 0 ]]
>>> + export TESSDATA_PREFIX=
>>> + TESSDATA_PREFIX=
>>> + for img_file in '${img_files}'
>>> + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf
>>> + for file in '$@'
>>> + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]]
>>> + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not
>>> exist or is not readable'
>>> + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf'
>>> does not exist or is not readable
>>> + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log
>>> ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist
>>> or is not readable
>>> + exit 1
>>>
>>> Could you please help me?
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>
>
> --
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/fiwGpIq3xuU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduU2cXNw1565YJBYu-foO0%2BmV5whCPJXkmTFDb9iV6BnKw%
> 40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgzeOZM3FB9nwrALr9ZutRFm%3DcUtJ%2Bz%3DsotPWQBiO8iKUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Unrecognized argument --linedata_only

2018-06-11 Thread Khosrobeigy.zohreh
I am using this command and it is true
But i have trained 500 lines. but when tesseract 48000 images tiff,
show an error:
No space left on device
 My RAM is 16 g
and swap is: 20g
tiff file's size is 4 g too.

On Sat, Jun 9, 2018 at 11:33 AM, ShreeDevi Kumar 
wrote:

> --linedata_only should work.
>
> > tesseract 4.0.0-beta.1
>
> Do you know which commit? Please try with latest code.
>
> >   i am using   src/training/tesstrain.sh
>
> The command you used was:
>
> >  sudo tesstrain.sh
>
> Why do you need sudo?
>
> Please run the script with
>
> bash -x   src/training/tesstrain.sh etc ... and report with the console
> log.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Sat, Jun 9, 2018 at 11:57 AM Zohreh Khosrobeygi 
> wrote:
>
>> Yes, i am using   src/training/tesstrain.sh
>>
>>
>> On Friday, June 8, 2018 at 6:44:27 PM UTC+4:30, shree wrote:
>>>
>>> Are you using the correct version of tesstrain.sh?
>>>
>>> It should be in src/training/tesstrain.sh
>>>
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>>
>>> On Fri, Jun 8, 2018 at 6:49 PM Zohreh Khosrobeygi 
>>> wrote:
>>>
 Hi,
 I have been training tesseract but i have this errore"

 Unrecognized argument --linedata_only

 And it's my version of tesseract"
 tesseract 4.0.0-beta.1
  leptonica-1.74.4
   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
 zlib 1.2.8

  Found AVX2
  Found AVX
  Found SSE

 Besides it's my command:
 sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
 --training_text 
 /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text
--linedata_only \
   --noextract_font_properties --langdata_dir 
 /home/kddlab/Desktop/tesseract-master/langdata
 \
   --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \
   --fontlist "B Mitra" --output_dir /home/kddlab/Desktop/
 tesseract-master/1MyData/testfas

 And i have config file:
 # Use LSTM
 tessedit_ocr_engine_mode 1
 tessedit_pageseg_mode 6

 How can i solve this?

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/tesseract-ocr/caf0b092-1a2c-4e73-9171-16678495af51%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/GLlgILi5xOA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduWUNUQGwuRfmQ5EsdewAcvBj
> xOEzKcTrBqYmrSynHuoWg%40mail.gmail.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Zohreh Khosrobeygi
University of Tehran, 2016
Tel: +989196042887
khosrobeygi.zo...@ut.ac.ir 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

Re: [tesseract-ocr] Unrecognized argument --linedata_only

2018-06-09 Thread Khosrobeigy.zohreh
Thank. by your command fixed.
 but next i used this:

lstmtraining   \
  --traineddata
/home/kddlab/Desktop/tesseract-master/1MyData/testfas/fas/fas.traineddata
 --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
  --model_output
/home/kddlab/Desktop/tesseract-master/1MyData/testfasout/base
--learning_rate 20e-4 \
  --train_listfile
/home/kddlab/Desktop/tesseract-master/1MyData/testfas/fas.training_files.txt
\
  --eval_listfile
/home/kddlab/Desktop/tesseract-master/1MyData/testfas1/fas.training_files.txt
\
  --max_iterations 5000
&>/home/kddlab/Desktop/tesseract-master/1MyData/testfasout/basetrain.log
 and i have this *error now*

*Segmentation fault (core dumped)*


Could you please help me again?

On Sat, Jun 9, 2018 at 11:33 AM, ShreeDevi Kumar 
wrote:

> --linedata_only should work.
>
> > tesseract 4.0.0-beta.1
>
> Do you know which commit? Please try with latest code.
>
> >   i am using   src/training/tesstrain.sh
>
> The command you used was:
>
> >  sudo tesstrain.sh
>
> Why do you need sudo?
>
> Please run the script with
>
> bash -x   src/training/tesstrain.sh etc ... and report with the console
> log.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Sat, Jun 9, 2018 at 11:57 AM Zohreh Khosrobeygi 
> wrote:
>
>> Yes, i am using   src/training/tesstrain.sh
>>
>>
>> On Friday, June 8, 2018 at 6:44:27 PM UTC+4:30, shree wrote:
>>>
>>> Are you using the correct version of tesstrain.sh?
>>>
>>> It should be in src/training/tesstrain.sh
>>>
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>>
>>> On Fri, Jun 8, 2018 at 6:49 PM Zohreh Khosrobeygi 
>>> wrote:
>>>
 Hi,
 I have been training tesseract but i have this errore"

 Unrecognized argument --linedata_only

 And it's my version of tesseract"
 tesseract 4.0.0-beta.1
  leptonica-1.74.4
   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
 zlib 1.2.8

  Found AVX2
  Found AVX
  Found SSE

 Besides it's my command:
 sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
 --training_text 
 /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text
--linedata_only \
   --noextract_font_properties --langdata_dir 
 /home/kddlab/Desktop/tesseract-master/langdata
 \
   --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \
   --fontlist "B Mitra" --output_dir /home/kddlab/Desktop/
 tesseract-master/1MyData/testfas

 And i have config file:
 # Use LSTM
 tessedit_ocr_engine_mode 1
 tessedit_pageseg_mode 6

 How can i solve this?

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/tesseract-ocr/caf0b092-1a2c-4e73-9171-16678495af51%
>> 40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/tesseract-ocr/GLlgILi5xOA/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/CAG2NduWUNUQGwuRfmQ5EsdewAcvBj
> xOEzKcTrBqYmrSynHuoWg%40mail.gmail.com
>