Re: [tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-06-01 Thread shree
Please see https://github.com/Shreeshrii/tessdata_coptic

for the traineddata files.

On Friday, June 1, 2018 at 10:45:11 PM UTC+5:30, Ramast wrote:
>
> Impressive! I thought we would need to do a lot of work in order to reach 
> that stage.
>
>
> ⲁⲩⲱ ⲟⲛ ⲁⲓ̈ⲧⲣⲉⲩ ⲣ̄ ⲥⲟⲟⲩ ⲛ̄ ⲉⲃⲟⲧ ⲉⲩⲕⲏⲧ ⲉ ϩⲃⲟⲩⲣ
> ⲉⲩⲉⲓⲣⲉ ⲛ̄ ⲛⲉ ϩⲃⲏⲩⲉ ⲛ̄ ⲛⲉⲩⲁⲡⲟⲧⲉⲗⲉⲥⲙⲁ ⲙⲛ̄ ⲛⲉⲩ–
> ⲥⲭⲏⲙⲁ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ϩⲛ̄ ⲟⲩ ϩⲃⲁ ⲉⲩⲉⲣ̄ ϩⲃⲁ·
> ⲁⲩⲱ ϩⲛ̄ ⲟⲩ ⲡⲗⲁⲛⲏ ⲉⲩⲉⲡⲗⲁⲛⲁ ⲛ̄ϭⲓ ⲛ ⲁⲣⲭⲱ̄ ⲉⲧ
> ϣⲟⲟⲡ ϩⲛ̄ ⲛ ⲁⲓⲱ̄ ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲥⲫⲁⲓⲣⲁ ⲁⲩⲱ ϩⲛ̄  5
> ⲛⲉⲩⲙ̄ⲡⲏⲩⲉ· ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲧⲟⲡⲟⲥ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ⲛ̄
> ⲛⲉⲩⲛⲟⲓ̈ ⲛ̄ ⲧⲉⲩϭⲓⲛⲙⲟⲟϣⲉ ⲙ̄ⲙⲓⲛ ⲙ̄ⲙⲟ–
> ?? ⲟⲩ: ⲁⲥϣⲱⲡⲉ ϭⲉ ⲛ̄ⲧⲉⲣⲉ ⲓ̄ⲥ̄ ⲟⲩⲱ ⲉϥϫⲱ ⲛ̄
> ⲡⲉⲓ̈ ϣⲁϫⲉ ⲉⲣⲉ ⲫⲓⲗⲓⲡⲡⲟⲥ ϩⲙⲟⲟⲥ ⲉϥⲥϩⲁⲓ̈ ⲛ̄ ϣⲁϫⲉ
> ⲗ̄ⲁ̄ ⲁ. ⲛⲓⲙ ⲉⲧ ⲉⲣⲉ ⲓ̄ⲥ̄ ϫⲱ ⲙ̄ⲙⲟⲟⲩ; ⲁⲥϣⲱⲡⲉ ϭⲉ ⲙⲛ̄ⲛ̄ⲥⲁ 10
>
>
>
> On 05/31/2018 06:42 AM, ShreeDevi Kumar wrote:
>
> I am attaching the recognition result of the one page image you gave from 
> the test model for Coptic I have built. If you can send me the correct 
> unicode transcription for that page, I can further fine tune it. You can 
> then further modify as per your needs.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6b1e4b38-15a2-4df9-8f53-cfe5629c68da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-06-01 Thread Ramast
 I am so sorry for late reply, I send it yesterday but for some reasons
it's still in my draft folder.
Here is the original email.


Impressive! I thought we would need to do a lot of work in order to reach
that stage.


ⲁⲩⲱ ⲟⲛ ⲁⲓ̈ⲧⲣⲉⲩ ⲣ̄ ⲥⲟⲟⲩ ⲛ̄ ⲉⲃⲟⲧ ⲉⲩⲕⲏⲧ ⲉ ϩⲃⲟⲩⲣ
ⲉⲩⲉⲓⲣⲉ ⲛ̄ ⲛⲉ ϩⲃⲏⲩⲉ ⲛ̄ ⲛⲉⲩⲁⲡⲟⲧⲉⲗⲉⲥⲙⲁ ⲙⲛ̄ ⲛⲉⲩ–
ⲥⲭⲏⲙⲁ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ϩⲛ̄ ⲟⲩ ϩⲃⲁ ⲉⲩⲉⲣ̄ ϩⲃⲁ·
ⲁⲩⲱ ϩⲛ̄ ⲟⲩ ⲡⲗⲁⲛⲏ ⲉⲩⲉⲡⲗⲁⲛⲁ ⲛ̄ϭⲓ ⲛ ⲁⲣⲭⲱ̄ ⲉⲧ
ϣⲟⲟⲡ ϩⲛ̄ ⲛ ⲁⲓⲱ̄ ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲥⲫⲁⲓⲣⲁ ⲁⲩⲱ ϩⲛ̄  5
ⲛⲉⲩⲙ̄ⲡⲏⲩⲉ· ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲧⲟⲡⲟⲥ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ⲛ̄
ⲛⲉⲩⲛⲟⲓ̈ ⲛ̄ ⲧⲉⲩϭⲓⲛⲙⲟⲟϣⲉ ⲙ̄ⲙⲓⲛ ⲙ̄ⲙⲟ–
?? ⲟⲩ: ⲁⲥϣⲱⲡⲉ ϭⲉ ⲛ̄ⲧⲉⲣⲉ ⲓ̄ⲥ̄ ⲟⲩⲱ ⲉϥϫⲱ ⲛ̄
ⲡⲉⲓ̈ ϣⲁϫⲉ ⲉⲣⲉ ⲫⲓⲗⲓⲡⲡⲟⲥ ϩⲙⲟⲟⲥ ⲉϥⲥϩⲁⲓ̈ ⲛ̄ ϣⲁϫⲉ
ⲗ̄ⲁ̄ ⲁ. ⲛⲓⲙ ⲉⲧ ⲉⲣⲉ ⲓ̄ⲥ̄ ϫⲱ ⲙ̄ⲙⲟⲟⲩ; ⲁⲥϣⲱⲡⲉ ϭⲉ ⲙⲛ̄ⲛ̄ⲥⲁ 10



On 05/31/2018 06:42 AM, ShreeDevi Kumar wrote:

I am attaching the recognition result of the one page image you gave from
the test model for Coptic I have built. If you can send me the correct
unicode transcription for that page, I can further fine tune it. You can
then further modify as per your needs.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAM2wyAUviZr7-phZPavL7iRfSkZoDsm3PeF-uYT4Kj1K%2B7bv4A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-06-01 Thread Ramast Magdy

  
  
Impressive! I thought we would need to
  do a lot of work in order to reach that stage.
  
  
  ⲁⲩⲱ ⲟⲛ ⲁⲓ̈ⲧⲣⲉⲩ ⲣ̄ ⲥⲟⲟⲩ ⲛ̄ ⲉⲃⲟⲧ ⲉⲩⲕⲏⲧ ⲉ ϩⲃⲟⲩⲣ
  ⲉⲩⲉⲓⲣⲉ ⲛ̄ ⲛⲉ ϩⲃⲏⲩⲉ ⲛ̄ ⲛⲉⲩⲁⲡⲟⲧⲉⲗⲉⲥⲙⲁ ⲙⲛ̄ ⲛⲉⲩ–
  ⲥⲭⲏⲙⲁ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ϩⲛ̄ ⲟⲩ ϩⲃⲁ ⲉⲩⲉⲣ̄ ϩⲃⲁ·
  ⲁⲩⲱ ϩⲛ̄ ⲟⲩ ⲡⲗⲁⲛⲏ ⲉⲩⲉⲡⲗⲁⲛⲁ ⲛ̄ϭⲓ ⲛ ⲁⲣⲭⲱ̄ ⲉⲧ
  ϣⲟⲟⲡ ϩⲛ̄ ⲛ ⲁⲓⲱ̄ ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲥⲫⲁⲓⲣⲁ ⲁⲩⲱ ϩⲛ̄  5
  ⲛⲉⲩⲙ̄ⲡⲏⲩⲉ· ⲁⲩⲱ ϩⲛ̄ ⲛⲉⲩⲧⲟⲡⲟⲥ ⲧⲏⲣⲟⲩ· ϫⲉ ⲕⲁⲥ ⲛ̄
  ⲛⲉⲩⲛⲟⲓ̈ ⲛ̄ ⲧⲉⲩϭⲓⲛⲙⲟⲟϣⲉ ⲙ̄ⲙⲓⲛ ⲙ̄ⲙⲟ–
  ?? ⲟⲩ: ⲁⲥϣⲱⲡⲉ ϭⲉ ⲛ̄ⲧⲉⲣⲉ ⲓ̄ⲥ̄ ⲟⲩⲱ ⲉϥϫⲱ ⲛ̄
  ⲡⲉⲓ̈ ϣⲁϫⲉ ⲉⲣⲉ ⲫⲓⲗⲓⲡⲡⲟⲥ ϩⲙⲟⲟⲥ ⲉϥⲥϩⲁⲓ̈ ⲛ̄ ϣⲁϫⲉ
  ⲗ̄ⲁ̄ ⲁ. ⲛⲓⲙ ⲉⲧ ⲉⲣⲉ ⲓ̄ⲥ̄ ϫⲱ ⲙ̄ⲙⲟⲟⲩ; ⲁⲥϣⲱⲡⲉ ϭⲉ ⲙⲛ̄ⲛ̄ⲥⲁ 10
  
  
  
  On 05/31/2018 06:42 AM, ShreeDevi Kumar wrote:


  
I
  am attaching the recognition result of the one page image you
  gave from the test model for Coptic I have built. If you can
  send me the correct unicode transcription for that page, I can
  further fine tune it. You can then further modify as per your
  needs.
  



  





-- 
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7d2a3238-bdf0-8c7b-f145-a7db189e97e8%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] programmatically set writing direction and text line order and orientatin in C++

2018-06-01 Thread fzhang556
I'm trying to write a C++ program using tesseract 4 (beta) to OCR 
traditional Chinese texts in images. These texts are often written right to 
left and top to bottom.I read and printed out the page orientation, writing 
direction and text line order parameters from tesseract detected results. 
But they are apparently wrong! I wonder if there is a method to 
programmatically set these parameters before OCRing the images? I assume 
there has to be a way to do this. But all I found searching the net were 
posts saying these parameters were read only. Any help will be greatly 
appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8902f917-800f-438e-928b-fdc1b35c7703%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Not able install tesseract ocr on ubuntu 17.04

2018-06-01 Thread Александр Поздняков
tesseract 4.00 beta 

*Instructions*


*1. Open a terminal and install apt-transport-https*

sudo apt-get install apt-transport-https


*2. Open /etc/apt/sources.list*

Add  this line: deb https://notesalexp.org/tesseract-ocr/zesty/ zesty main

Replace repository: 

deb http://us.archive.ubuntu.com/ubuntu/ zesty main restricted universe 
multiverse 

deb http://us.archive.ubuntu.com/ubuntu/ zesty-security main restricted 
universe multiverse

deb http://us.archive.ubuntu.com/ubuntu/ zesty-updates main restricted 
universe multiverse

deb http://us.archive.ubuntu.com/ubuntu/ zesty-proposed main restricted 
universe multiverse 

deb http://us.archive.ubuntu.com/ubuntu/ zesty-backports main restricted 
universe multiverse 

at 

deb http://old-releases.ubuntu.com/ubuntu/ zesty main restricted universe 
multiverse 

deb http://old-releases.ubuntu.com/ubuntu/ zesty-security main restricted 
universe multiverse 
deb http://old-releases.ubuntu.com/ubuntu/ zesty-updates main restricted 
universe multiverse 
deb http://old-releases.ubuntu.com/ubuntu/ zesty-proposed main restricted 
universe multiverse 
deb http://old-releases.ubuntu.com/ubuntu/ zesty-backports main restricted 
universe multiverse 

Save and close sources.list


*3.Fetch and install the GnuPG key*

sudo apt-get update -oAcquire::AllowInsecureRepositories=true
sudo apt-get install notesalexp-keyring 
-oAcquire::AllowInsecureRepositories=true
sudo apt-get update


*4. Enjoy*

sudo apt-get install tesseract-ocr


P.S. in my humble opinion, you need to upgrade to ubuntu 18.04

пятница, 1 июня 2018 г., 10:07:10 UTC+3 пользователь shree написал:
>
> Please see the email from Alex and follow instructions in that.
>
> On Fri 1 Jun, 2018, 10:08 AM RT-Rakesh, > 
> wrote:
>
>>
>> Hi ShreeDevi,
>>
>> Thanks for your response.
>>
>> I am still getting this error when trying with the command that you 
>> shared.
>> Please assist me how to go about here. 
>>
>> Thank you very much.
>>
>> user@computer:~$ sudo apt install tesseract-ocr 
>> Reading package lists... Done
>> Building dependency tree   
>> Reading state information... Done
>> The following packages were automatically installed and are no longer 
>> required:
>>   libgnutls-openssl27 postfix-sqlite
>> Use 'sudo apt autoremove' to remove them.
>> The following additional packages will be installed:
>>   libgif7 liblept5 libtesseract4 tesseract-ocr-eng tesseract-ocr-osd
>> The following NEW packages will be installed:
>>   libgif7 liblept5 libtesseract4 tesseract-ocr tesseract-ocr-eng 
>> tesseract-ocr-osd
>> 0 upgraded, 6 newly installed, 0 to remove and 180 not upgraded.
>> Need to get 6,938 kB of archives.
>> After this operation, 21.6 MB of additional disk space will be used.
>> Do you want to continue? [Y/n] y
>> Err:1 http://us.archive.ubuntu.com/ubuntu zesty/main amd64 libgif7 amd64 
>> 5.1.4-0.4
>>   404  Not Found [IP: 91.189.91.23 80]
>> Get:2 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main 
>> amd64 liblept5 amd64 1.74.4-1+nmu1ppa1~zesty1 [929 kB]
>> Get:3 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main 
>> amd64 libtesseract4 amd64 4.00~git2192-10a8a67c-1ppa1~zesty1 [1,180 kB]
>> Get:4 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main 
>> amd64 tesseract-ocr-eng all 4.00~git15-45ed289-1ppa1~zesty1 [1,590 kB]  
>>  
>> Get:5 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main 
>> amd64 tesseract-ocr-osd all 4.00~git15-45ed289-1ppa1~zesty1 [2,989 kB]  
>>  
>> Get:6 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main 
>> amd64 tesseract-ocr amd64 4.00~git2192-10a8a67c-1ppa1~zesty1 [219 kB]  
>>   
>> Fetched 6,907 kB in 25s (271 kB/s)
>> 
>>  
>> E: Failed to fetch 
>> http://us.archive.ubuntu.com/ubuntu/pool/main/g/giflib/libgif7_5.1.4-0.4_amd64.deb
>>   
>> 404  Not Found [IP: 91.189.91.23 80]
>> E: Unable to fetch some archives, maybe run apt-get update or try with 
>> --fix-missing?
>>
>>
>> On Thursday, 31 May 2018 15:24:48 UTC+5:30, shree wrote:
>>>
>>> Remove the existing version, then
>>>
>>>
>>> sudo add-apt-repository ppa:alex-p/tesseract-ocr
>>> sudo apt-get update
>>>
>>>
>>> sudo apt install tesseract-ocr 
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Thu, May 31, 2018 at 12:29 PM, RT-Rakesh  wrote:
>>>
 user@computer:~$ sudo apt install tesseract-ocr
 Reading package lists... Done
 Building dependency tree   
 Reading state information... Done
 The following packages were automatically installed and are no longer 
 required:
   

Re: [tesseract-ocr] lstmeval gives a perfect result but tesseract fails

2018-06-01 Thread ShreeDevi Kumar
>From what I understand from the documentation provided by Ray Smith
regarding LSTM training, the models have been trained on hundreds of
thousands of lines and  hundreds of fonts. The network spec used for
training from scratch will therefore be optimized for such large models.

You seem to have a different requirement, hence I suggested building the
legacy tesseract model.

You can experiment and see if it is better.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Jun 1, 2018 at 12:23 PM, Julien Jemine 
wrote:

> Hi Shree,
>
> Thanks for your answer.
> If you don't mind, could you explain why it'd be better ?
>
> Le jeudi 31 mai 2018 17:25:47 UTC+2, shree a écrit :
>>
>> >I've trained a LSTM model for a custom language from scratch as explained
>>  here
>> .
>>
>> >The language only has about 100 words and 17 characters, so it's pretty
>> simple.
>>
>> For such a small model, try to build the legacy version rather than LSTM.
>>
>> $tesstrain_dir/tesstrain.sh \
>>--lang $Lang \
>>--exposures "0" \
>>--fonts_dir $fonts_dir \
>>--fontlist $fonts_for_training \
>>--langdata_dir $langdata_dir \
>>--tessdata_dir  $tessdata_dir \
>>--training_text $langdata_dir/$Lang/$Lang.training_text \
>>--output_dir $train_output_dir
>>
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, May 31, 2018 at 3:43 PM, Julien Jemine 
>> wrote:
>>
>>> Hi,
>>>
>>> I've trained a LSTM model for a custom language from scratch as
>>> explained here
>>> 
>>> .
>>>
>>> The language only has about 100 words and 17 characters, so it's pretty
>>> simple.
>>>
>>> When I run lstmeval on my model, I get a perfect match:
>>> [icm@u16-offcao-07] train1$ lstmeval --model
>>> /home/icm/share/tessdata/iqi.traineddata --eval_listfile
>>> iqitrain2/iqi.training_files.txt --verbosity 2
>>> Loaded 2/2 pages (1-2) of document /home/icm/train1/iqitrain2/iqi
>>> .Arial.exp0.lstmf
>>> Loaded 2/2 pages (1-2) of document /home/icm/train1/iqitrain2/iqi
>>> .Calibri.exp0.lstmf
>>> Warning: LSTMTrainer deserialized an LSTMRecognizer!
>>> Truth:ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> OCR  :ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> Truth:6CUEN 6 CU EN
>>> OCR  :6CUEN 6 CU EN
>>> Loaded 2/2 pages (1-2) of document /home/icm/train1/iqitrain2/iqi
>>> .Lucida_Sans_Typewriter_Semi-Condensed.exp0.lstmf
>>> Truth:ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> OCR  :ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> Truth:6CUEN 6 CU EN
>>> OCR  :6CUEN 6 CU EN
>>> Loaded 2/2 pages (1-2) of document /home/icm/train1/iqitrain2/iqi
>>> .Verdana.exp0.lstmf
>>> Truth:ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> OCR  :ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> Truth:6CUEN 6 CU EN
>>> OCR  :6CUEN 6 CU EN
>>> Truth:6CUEN 6 CU EN
>>> OCR  :6CUEN 6 CU EN
>>> Truth:ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> OCR  :ASTM 10FEEN 10 FE EN 13CUEN 13 CU EN 02B 11 16
>>> At iteration 0, stage 0, Eval Char error rate=0, Word error rate=0
>>>
>>> However, when I put my iqi.traineddata file in my tessdata folder and
>>> try to run tesseract on the same tif file, I get errors:
>>> [icm@u16-offcao-07] train1$ tesseract iqitrain2/iqi.training_img.txt
>>> stdout -l iqi
>>> Page 0 : /home/icm/train1/iqitrain2/iqi.Arial.exp0.tif
>>> 6CFN
>>> 6CUEN 1 CU EN
>>> Page 1 : /home/icm/train1/iqitrain2/iqi.Calibri.exp0.tif
>>>
>>> 6CM 10FEEN 0 6 FEE 13CUEN 11 6 FE EEN 1116
>>> 6UEN 16 FE
>>> Page 2 : /home/icm/train1/iqitrain2/iqi.Lucida_Sans_Typewriter_Semi-
>>> Condensed.exp0.tif
>>>
>>> 6TM 13CUEN 13 1 EN 11CUE 11 CU EN 12B 11 16
>>> 6 6 CU EN
>>> Page 3 : /home/icm/train1/iqitrain2/iqi.Verdana.exp0.tif
>>>
>>> ASTM 103UEEN 13 1CU EN 13CUEN 13 6 FE EEN 11 16
>>> 6CUEN 6 CU EN
>>>
>>>
>>> Now the really frustrating part: I have the opposite phenomenon with the
>>> "eng" language! (with eng.traineddata taken from tessdata_best)
>>> lstmeval gives me a few errors (Eval Char error rate=2.4665552, Word
>>> error rate=16.67)
>>> tesseract gives me the right answer! (But the images are generated with
>>> tesstrain.sh and very common fonts, it's probably to be expected).
>>>
>>> Am I doing something wrong?
>>> What's going on here?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 

Re: [tesseract-ocr] Not able install tesseract ocr on ubuntu 17.04

2018-06-01 Thread ShreeDevi Kumar
Please see the email from Alex and follow instructions in that.

On Fri 1 Jun, 2018, 10:08 AM RT-Rakesh,  wrote:

>
> Hi ShreeDevi,
>
> Thanks for your response.
>
> I am still getting this error when trying with the command that you shared.
> Please assist me how to go about here.
>
> Thank you very much.
>
> user@computer:~$ sudo apt install tesseract-ocr
> Reading package lists... Done
> Building dependency tree
> Reading state information... Done
> The following packages were automatically installed and are no longer
> required:
>   libgnutls-openssl27 postfix-sqlite
> Use 'sudo apt autoremove' to remove them.
> The following additional packages will be installed:
>   libgif7 liblept5 libtesseract4 tesseract-ocr-eng tesseract-ocr-osd
> The following NEW packages will be installed:
>   libgif7 liblept5 libtesseract4 tesseract-ocr tesseract-ocr-eng
> tesseract-ocr-osd
> 0 upgraded, 6 newly installed, 0 to remove and 180 not upgraded.
> Need to get 6,938 kB of archives.
> After this operation, 21.6 MB of additional disk space will be used.
> Do you want to continue? [Y/n] y
> Err:1 http://us.archive.ubuntu.com/ubuntu zesty/main amd64 libgif7 amd64
> 5.1.4-0.4
>   404  Not Found [IP: 91.189.91.23 80]
> Get:2 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main
> amd64 liblept5 amd64 1.74.4-1+nmu1ppa1~zesty1 [929 kB]
> Get:3 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main
> amd64 libtesseract4 amd64 4.00~git2192-10a8a67c-1ppa1~zesty1 [1,180 kB]
> Get:4 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main
> amd64 tesseract-ocr-eng all 4.00~git15-45ed289-1ppa1~zesty1 [1,590 kB]
>
> Get:5 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main
> amd64 tesseract-ocr-osd all 4.00~git15-45ed289-1ppa1~zesty1 [2,989 kB]
>
> Get:6 http://ppa.launchpad.net/alex-p/tesseract-ocr/ubuntu zesty/main
> amd64 tesseract-ocr amd64 4.00~git2192-10a8a67c-1ppa1~zesty1 [219 kB]
>
> Fetched 6,907 kB in 25s (271 kB/s)
>
>
> E: Failed to fetch
> http://us.archive.ubuntu.com/ubuntu/pool/main/g/giflib/libgif7_5.1.4-0.4_amd64.deb
> 404  Not Found [IP: 91.189.91.23 80]
> E: Unable to fetch some archives, maybe run apt-get update or try with
> --fix-missing?
>
>
> On Thursday, 31 May 2018 15:24:48 UTC+5:30, shree wrote:
>>
>> Remove the existing version, then
>>
>>
>> sudo add-apt-repository ppa:alex-p/tesseract-ocr
>> sudo apt-get update
>>
>>
>> sudo apt install tesseract-ocr
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, May 31, 2018 at 12:29 PM, RT-Rakesh  wrote:
>>
>>> user@computer:~$ sudo apt install tesseract-ocr
>>> Reading package lists... Done
>>> Building dependency tree
>>> Reading state information... Done
>>> The following packages were automatically installed and are no longer
>>> required:
>>>   libgnutls-openssl27 postfix-sqlite
>>> Use 'sudo apt autoremove' to remove them.
>>> The following additional packages will be installed:
>>>   libgif7 liblept5 libtesseract-data libtesseract3 tesseract-ocr-eng
>>>   tesseract-ocr-equ tesseract-ocr-osd
>>> The following NEW packages will be installed:
>>>   libgif7 liblept5 libtesseract-data libtesseract3 tesseract-ocr
>>>   tesseract-ocr-eng tesseract-ocr-equ tesseract-ocr-osd
>>> 0 upgraded, 8 newly installed, 0 to remove and 180 not upgraded.
>>> Need to get 945 kB/14.6 MB of archives.
>>> After this operation, 57.5 MB of additional disk space will be used.
>>> Do you want to continue? [Y/n] y
>>> Err:1 http://us.archive.ubuntu.com/ubuntu zesty/main amd64 libgif7
>>> amd64 5.1.4-0.4
>>>   404  Not Found [IP: 91.189.91.23 80]
>>> Err:2 http://us.archive.ubuntu.com/ubuntu zesty/universe amd64 liblept5
>>> amd64 1.74.1-1
>>>   404  Not Found [IP: 91.189.91.23 80]
>>> E: Failed to fetch
>>> http://us.archive.ubuntu.com/ubuntu/pool/main/g/giflib/libgif7_5.1.4-0.4_amd64.deb
>>> 404  Not Found [IP: 91.189.91.23 80]
>>> E: Failed to fetch
>>> http://us.archive.ubuntu.com/ubuntu/pool/universe/l/leptonlib/liblept5_1.74.1-1_amd64.deb
>>> 404  Not Found [IP: 91.189.91.23 80]
>>> E: Unable to fetch some archives, maybe run apt-get update or try with
>>> --fix-missing?
>>>
>>>
>>> *This is the error being thrown, can some one help me with how to solve
>>> this issue. *
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/06faa78f-7a57-4749-9cf2-e9bdce5721c1%40googlegroups.com
>>> 
>>> .
>>> For