Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Shree Devi Kumar
If you have the problem with the master version also, please open an issue
on github.

Please include a stack trace/debug information also.

On Mon, 26 Nov 2018, 10:48 Khosrobeigy.zohreh  And I have the problem again
> *Kind regards,*
> *Zohreh Khosrobeygi*
>
> *Student of IT*
>
> *University of Tehran, 2016*
>
> *Phone: (+98)9196042887*
>
> *Email:khosrobeygi.zo...@ut.ac.ir *
>
>
>
> On Mon, Nov 26, 2018 at 6:45 PM Khosrobeigy.zohreh 
> wrote:
>
>> I downloaded tesseract-master from github and reinstall it again but now
>> my version is:
>> tesseract 4.0.0
>>  leptonica-1.76.0
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
>> 1.2.8
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>> Is that  true?
>>
>> *Kind regards,*
>> *Zohreh Khosrobeygi*
>>
>> *Student of IT*
>>
>> *University of Tehran, 2016*
>>
>> *Phone: (+98)9196042887*
>>
>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>
>>
>>
>> On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar 
>> wrote:
>>
>>> Please update to the latest version from github and try.
>>>
>>> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh >> wrote:
>>>
 tesseract 4.0.0-beta.4
  leptonica-1.76.0
   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
 zlib 1.2.8
  Found AVX2
  Found AVX
  Found SSE

 *Kind regards,*
 *Zohreh Khosrobeygi*

 *Student of IT*

 *University of Tehran, 2016*

 *Phone: (+98)9196042887*

 *Email:khosrobeygi.zo...@ut.ac.ir *



 On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
 wrote:

> What is the version of tesseract?
>
>
> tesseract -v
>
> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi  wrote:
>
>> Hi,
>> I have been runnig about 130G data which are 4000 files. My command is
>>
>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>   --traineddata
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>   --model_output
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>> --learning_rate 0.001 \
>>   --train_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>> \
>>   --eval_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>> \ --max_iterations 15
>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>
>> but after reading some files the tesseract gives the error and stop
>> training:
>>
>> Loaded 821/10179 pages (1-821) of document
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>> lt-lstmtraining: genericvector.h:720: T&
>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>> >= 0 && index < size_used_' failed.
>> Could you please help me?
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving 

[tesseract-ocr] New jpn_vert.trainnedata

2018-11-26 Thread Seokbong Choi
Hello all,

Although our jpn_vert from best worked greatly, it didn't serve my purpose 
- reading comic books.
Here, I retrained with the new font and new expressions where most Japanese 
comic books use.

https://github.com/zodiac3539/jpn_vert

   - 
   
   Add more fonts - Othutome, the font where most comic books use.
   - 
   
   Trained almost 200,000 cycles. The character level error rate is less 
   than 0.3%.
   - 
   
   Whenever Tesseract stumbles upon ♥ ‼, Tesseract is likely to make a 
   mistake, distorting the entire sentence. So, I trained these characters 
   thoroughly. The result is remarkable. Feel free to leave any comment on my 
   GitHub
   - 
   
   
   

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b6f78986-02ac-4569-8994-01769271dd3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract v4 generated incorrect text output

2018-11-26 Thread Seokbong Choi
Hello,

OEM and PSM are values that you should set up whenever you execute
tesseract.exe, which cannot be automatically detected under the current
version. (I hope it can be improved in the next version)
I guess you are in the situation where the optimal result can be obtained
through different psm values right? Unfortunately, it's a manual labor
under this version.
psm 4 generally works, if your sentence is horizontally aligned, whereas
psm 5 works in vertically aligned Chinese-Japanese-Korean (CJK) sentences.

I ran your bmp with psm 4 option, it worked. Although it prompted the
result that you may not desire, by appending 용 된 다 at the end of the
sentence. In that case, I would suggest you to retrain. It may improve
accuracy. (I had a similar issue with Japanese) I hope it would help.

[image: image.png]


On Mon, Nov 26, 2018 at 12:18 PM Hwa Chuang  wrote:

> I was testing Tesseract v4 and found some text files generated by image
> have incorrect string. For example, I have image as below:
>
>
> [image: 2018-11-26 11_29_42-Photos.png]
>
> $ ./tesseract.exe Korean.bmp Korean -l kor
> Tesseract Open Source OCR Engine v4.0.0 with Leptonica
>
> $ cat Korean.txt
> 을 만 나 서 반 가 워 요 ! 이 테 스 트 목 적 을 위해 사
>
> 志 巳
> 必 白
>
> It's pretty clear that output text string is almost completely incorrect.
> However, I can have correct test string if page segmentation mode is 11.
>
> $ ./tesseract.exe Korean.bmp Korean-psm11 -l kor --psm 11
> Tesseract Open Source OCR Engine v4.0.0 with Leptonica
>
> $ cat Korean-psm11.txt
> 당 신 이 선 생 님 을 만 나 서 반 가 워 요 ! 이 테 스 트 목 적 을 위해 사
>
> 용 된 다 .
>
> The problem is I can not change psm image by image.
>
> Any suggestion?
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6d25e720-dcab-4659-b0bc-4d9928dbf0e4%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BVWkA6veP3MpnuTZS6mpOd5TBmsH9qSeS4EpdQC0Z97Z4HaLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Analyze output from the OCR tutorial

2018-11-26 Thread Marziye Rahmati
Hello to all
Can anyone help me understand the output from the training OCR version 4? 
for example : 
What is delta mean ؟ 
At iteration 3052/5000/5102, Mean rms=0.85%, delta=0.98%, char 
train=3.846%, word train=5.917%, skip ratio=2.1%,  New worst char error = 
3.846 wrote checkpoint.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d9c9e767-a25b-40d1-b2ec-54ca3f381711%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Extract Header and Footer text separately from document image

2018-11-26 Thread bohdan . moskalevskyi
Same here. I’m surprised this issue isn’t more common. Any solutions?

понеділок, 9 квітня 2018 р. 15:43:41 UTC+3 користувач Mohit Jain написав:
>
> Is there a way to extract the header and footer content on a document page 
> separately using Tesseract OCR? I tried the hOCR output but it doesn't seem 
> to have any such tags associated with the output.
>
> Regards,
> Mohit
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05f41cbb-0dd0-4744-9eba-a98a65393176%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] How recognize footnotes

2018-11-26 Thread bohdan . moskalevskyi
hocr doesn’t help
see 
also 
https://groups.google.com/forum/#!searchin/tesseract-ocr/footer%7Csort:date/tesseract-ocr/YY4jMNmSoTM/KAMTzkc5AQAJ

вівторок, 30 травня 2017 р. 17:57:43 UTC+3 користувач shree написав:
>
> Try the `hocr` output and see if it provides some of what you need.
>
> I don't think tesseract will link to footnotes though it may recognize the 
> text.
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Tue, May 30, 2017 at 7:20 PM, Felipe Ghiardo  > wrote:
>
>> Hi all. 
>>  
>> Using another ocr engines (abby, for ex.), the process recognize the 
>> footnotes and make the link. Also recognize header and footer. The answer 
>> is how can i do the same with tesseract, at least with the footnotes. IIts 
>> something that one can train? And how do you do it? Thanks for the help 
>> (and sorry for my english). 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/dfaec4b7-77a2-4f01-be40-cf2fe1809ddd%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8787aeb9-2f55-4c15-9b67-c1319a46c2e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Handwriting training

2018-11-26 Thread DreadStarX
Afaik, tesseract doesn't do handwriting. I could be mistaken, there's 
another application that scans handwriting.

On Monday, November 26, 2018 at 4:40:48 AM UTC-8, Rob wrote:
>
> Hello everyone,
>
> I am currently working on making a scanned fillable text document readable 
> for the computer. This document can be filled in with computer writing as 
> well as with handwriting. The quality of the scanned document is good 
> enough and the font is not too small. I'm sing Ubuntu 18.04, Python 3 and 
> Tesseract 4.0.
>
> What is the best way to recognize both types of font (in particular 
> handwriting)? Do you have some easy steps for me to archieve the Training 
> for this Problem?
> I found this "https://github.com/OCR-D/ocrd-train;, it seems to make the 
> Training Process a lot easier right?
>
> Thanks in advance and best wishes.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5b0553d0-1fae-4b5b-a8a6-01f058d1c337%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
And I have the problem again
*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 6:45 PM Khosrobeigy.zohreh 
wrote:

> I downloaded tesseract-master from github and reinstall it again but now
> my version is:
> tesseract 4.0.0
>  leptonica-1.76.0
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>  Found AVX2
>  Found AVX
>  Found SSE
> Is that  true?
>
> *Kind regards,*
> *Zohreh Khosrobeygi*
>
> *Student of IT*
>
> *University of Tehran, 2016*
>
> *Phone: (+98)9196042887*
>
> *Email:khosrobeygi.zo...@ut.ac.ir *
>
>
>
> On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar 
> wrote:
>
>> Please update to the latest version from github and try.
>>
>> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh > wrote:
>>
>>> tesseract 4.0.0-beta.4
>>>  leptonica-1.76.0
>>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 :
>>> zlib 1.2.8
>>>  Found AVX2
>>>  Found AVX
>>>  Found SSE
>>>
>>> *Kind regards,*
>>> *Zohreh Khosrobeygi*
>>>
>>> *Student of IT*
>>>
>>> *University of Tehran, 2016*
>>>
>>> *Phone: (+98)9196042887*
>>>
>>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>>
>>>
>>>
>>> On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
>>> wrote:
>>>
 What is the version of tesseract?


 tesseract -v

 On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi >>> wrote:

> Hi,
> I have been runnig about 130G data which are 4000 files. My command is
>
> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>   --traineddata
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>   --model_output
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
> --learning_rate 0.001 \
>   --train_listfile
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
> \
>   --eval_listfile
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
> \ --max_iterations 15
> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>
> but after reading some files the tesseract gives the error and stop
> training:
>
> Loaded 821/10179 pages (1-821) of document
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
> lt-lstmtraining: genericvector.h:720: T&
> GenericVector::operator[](int) const [with T = char]: Assertion `index
> >= 0 && index < size_used_' failed.
> Could you please help me?
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
 --
 You received this message because you are subscribed to a topic in the
 Google Groups "tesseract-ocr" group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe
 .
 To unsubscribe from this group and all its topics, send an email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> 

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
I downloaded tesseract-master from github and reinstall it again but now my
version is:
tesseract 4.0.0
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
1.2.8
 Found AVX2
 Found AVX
 Found SSE
Is that  true?

*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 5:25 PM Shree Devi Kumar 
wrote:

> Please update to the latest version from github and try.
>
> On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh  wrote:
>
>> tesseract 4.0.0-beta.4
>>  leptonica-1.76.0
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
>> 1.2.8
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>>
>> *Kind regards,*
>> *Zohreh Khosrobeygi*
>>
>> *Student of IT*
>>
>> *University of Tehran, 2016*
>>
>> *Phone: (+98)9196042887*
>>
>> *Email:khosrobeygi.zo...@ut.ac.ir *
>>
>>
>>
>> On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
>> wrote:
>>
>>> What is the version of tesseract?
>>>
>>>
>>> tesseract -v
>>>
>>> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi >> wrote:
>>>
 Hi,
 I have been runnig about 130G data which are 4000 files. My command is

 /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
   --traineddata
 /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
 --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
   --model_output
 /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
 --learning_rate 0.001 \
   --train_listfile
 /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
 \
   --eval_listfile
 /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
 \ --max_iterations 15
 &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log

 but after reading some files the tesseract gives the error and stop
 training:

 Loaded 821/10179 pages (1-821) of document
 /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
 lt-lstmtraining: genericvector.h:720: T&
 GenericVector::operator[](int) const [with T = char]: Assertion `index
 >= 0 && index < size_used_' failed.
 Could you please help me?

 --
 You received this message because you are subscribed to the Google
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to tesseract-ocr+unsubscr...@googlegroups.com.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgyK6gbKMm-wYZGYSObJ5KOrL%2BCroiWdbUd6rshk8tvqqg%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" 

Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Shree Devi Kumar
Please update to the latest version from github and try.

On Mon, 26 Nov 2018, 08:36 Khosrobeigy.zohreh  tesseract 4.0.0-beta.4
>  leptonica-1.76.0
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
> 1.2.8
>  Found AVX2
>  Found AVX
>  Found SSE
>
> *Kind regards,*
> *Zohreh Khosrobeygi*
>
> *Student of IT*
>
> *University of Tehran, 2016*
>
> *Phone: (+98)9196042887*
>
> *Email:khosrobeygi.zo...@ut.ac.ir *
>
>
>
> On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
> wrote:
>
>> What is the version of tesseract?
>>
>>
>> tesseract -v
>>
>> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi > wrote:
>>
>>> Hi,
>>> I have been runnig about 130G data which are 4000 files. My command is
>>>
>>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>>   --traineddata
>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>>   --model_output
>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>>> --learning_rate 0.001 \
>>>   --train_listfile
>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>>> \
>>>   --eval_listfile
>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>>> \ --max_iterations 15
>>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>>
>>> but after reading some files the tesseract gives the error and stop
>>> training:
>>>
>>> Loaded 821/10179 pages (1-821) of document
>>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>>> lt-lstmtraining: genericvector.h:720: T&
>>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>>> >= 0 && index < size_used_' failed.
>>> Could you please help me?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgyK6gbKMm-wYZGYSObJ5KOrL%2BCroiWdbUd6rshk8tvqqg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWbMqPEzgSiMsNcv2b2aBJKOekkP7uJ2fjG3%3DnG0mGAjA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Khosrobeigy.zohreh
tesseract 4.0.0-beta.4
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib
1.2.8
 Found AVX2
 Found AVX
 Found SSE

*Kind regards,*
*Zohreh Khosrobeygi*

*Student of IT*

*University of Tehran, 2016*

*Phone: (+98)9196042887*

*Email:khosrobeygi.zo...@ut.ac.ir *



On Mon, Nov 26, 2018 at 3:33 PM Shree Devi Kumar 
wrote:

> What is the version of tesseract?
>
>
> tesseract -v
>
> On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi  wrote:
>
>> Hi,
>> I have been runnig about 130G data which are 4000 files. My command is
>>
>> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>>   --traineddata
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
>> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>>   --model_output
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
>> --learning_rate 0.001 \
>>   --train_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
>> \
>>   --eval_listfile
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
>> \ --max_iterations 15
>> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>>
>> but after reading some files the tesseract gives the error and stop
>> training:
>>
>> Loaded 821/10179 pages (1-821) of document
>> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
>> lt-lstmtraining: genericvector.h:720: T&
>> GenericVector::operator[](int) const [with T = char]: Assertion `index
>> >= 0 && index < size_used_' failed.
>> Could you please help me?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/FC8eb6ji3sY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAE1QSgyK6gbKMm-wYZGYSObJ5KOrL%2BCroiWdbUd6rshk8tvqqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Handwriting training

2018-11-26 Thread Rob
Hello everyone,

I am currently working on making a scanned fillable text document readable 
for the computer. This document can be filled in with computer writing as 
well as with handwriting. The quality of the scanned document is good 
enough and the font is not too small. I'm sing Ubuntu 18.04, Python 3 and 
Tesseract 4.0.

What is the best way to recognize both types of font (in particular 
handwriting)? Do you have some easy steps for me to archieve the Training 
for this Problem?
I found this "https://github.com/OCR-D/ocrd-train;, it seems to make the 
Training Process a lot easier right?

Thanks in advance and best wishes.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/783dc358-e7b7-47f7-9a82-06552d3af37d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Shree Devi Kumar
What is the version of tesseract?


tesseract -v

On Mon, 26 Nov 2018, 05:51 Zohreh Khosrobeygi  Hi,
> I have been runnig about 130G data which are 4000 files. My command is
>
> /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
>   --traineddata
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
> --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
>   --model_output
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
> --learning_rate 0.001 \
>   --train_listfile
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
> \
>   --eval_listfile
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
> \ --max_iterations 15
> &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log
>
> but after reading some files the tesseract gives the error and stop
> training:
>
> Loaded 821/10179 pages (1-821) of document
> /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
> lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int)
> const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
> Could you please help me?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWhjxGG0Xt51vanoO7P1Y86P2GCOKDRZk6RL2dYe2azqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Zohreh Khosrobeygi
Hi, 
I have been runnig about 130G data which are 4000 files. My command is

/home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
  --traineddata 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
   
--net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
  --model_output 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
 
--learning_rate 0.001 \
  --train_listfile 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
 
\
  --eval_listfile 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
 
\ --max_iterations 15 
&>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log

but after reading some files the tesseract gives the error and stop 
training:

Loaded 821/10179 pages (1-821) of document 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) 
const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Could you please help me?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.