Re: [tesseract-ocr] Problems recognized mixed scripts in Tesseract 4 alpha

2017-08-31 Thread ShreeDevi Kumar
Have you tried the best trained data for Chinese which has English in addition to Chinese as part of the training. That maybe a better option than using eng+ On 31-Aug-2017 12:31 PM, "Brendan O'Kane" wrote: > Hi all, > > Running 'tesseract -l eng+chi_tra' on a scanned page of

Re: [tesseract-ocr] Re: error when make

2017-08-30 Thread ShreeDevi Kumar
See https://abi-laboratory.pro/tracker/timeline/tesseract/ and https://github.com/tesseract-ocr/tesseract/issues/793 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Aug 30, 2017 at 7:27 AM, Carlos Miguens

Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-29 Thread ShreeDevi Kumar
I have opened this as an issue at https://github.com/tesserac t-ocr/tessdata/issues/77 You can provide additional feedback there. @theraysmith is doing the training at Google. The examples you provide will be helpful to him and improve future training. ShreeDevi

Re: [tesseract-ocr] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

2017-08-29 Thread ShreeDevi Kumar
Also see https://github.com/tesseract-ocr/tesseract/issues/221 On 29-Aug-2017 3:26 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > Check where the osd.traineddata and eng.trsineddata are installed. > Download other trained data to same directory. > > On Lin

Re: [tesseract-ocr] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

2017-08-29 Thread ShreeDevi Kumar
Check where the osd.traineddata and eng.trsineddata are installed. Download other trained data to same directory. On Linux, it is usually /use/share/tessdata On 29-Aug-2017 1:58 PM, "vikram charan" wrote: > Hello, > I'm working on project which base on scan many kind of

Re: [tesseract-ocr] tesseract is not working for straightforward image

2017-08-29 Thread ShreeDevi Kumar
Take a look at improve quality page in wiki. On 28-Aug-2017 6:16 PM, "Lada Tylich" wrote: > Hi, > I am confused that for the attached image it gives with parameter *-psm > 7* result *88C. *It should detect such a picture, I guess. > Am I missing something something? > >

Re: [tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

2017-08-29 Thread ShreeDevi Kumar
Try first with best/Latin.traineddata that should handle text with diacritics --- >>Pango suggested font Gandhari Unicode. Use "Gandhari Unicode" within quotes as Font name >>ERROR: Could not find training text file /usr/local/share/tessdata// eng/eng.training_text give script_dir

Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-28 Thread ShreeDevi Kumar
tor. See https://github.com/DanBloomberg/leptonica/commits/master ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 29, 2017 at 6:47 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > I had not chec

Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-28 Thread ShreeDevi Kumar
I had not checked the list. It should actually be Latin.traineddata for all languages written in Latin script. Not Spanish, as I had written. On 29-Aug-2017 3:54 AM, wrote: > So... I have installed the default tessdata used by the installer, which > seems to be this

Re: [tesseract-ocr] Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-28 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact The following command extracts the .lstm file from the .traineddata file. training/combine_tessdata -e tessdata/best/eng.traineddata \ ~/tesstutorial/impact_from_full/eng.lstm ShreeDevi

Re: [tesseract-ocr] Calling Resource sha1 is disabled! Use Resource sha256 instead Error while installing tesseract in mac

2017-08-28 Thread ShreeDevi Kumar
Try $ brew update $ brew install tesseract --HEAD ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Aug 28, 2017 at 12:33 PM, Mahesh Mesta wrote: > Hello, > > up votedown

Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-28 Thread ShreeDevi Kumar
Have you tried with the 'best' traineddatas? What about results using best/Spanish vs best/spa? I have opened this as an issue at https://github.com/tesseract-ocr/tessdata/issues/77 You can provide additional feedback there. ShreeDevi

Re: [tesseract-ocr] error while loading shared libraries: libtesseract.so.4: cannot open shared object file: No such file or directory

2017-08-27 Thread ShreeDevi Kumar
Did you do sudo ldconfig And try to run tesseract after that. On 27-Aug-2017 7:53 PM, "Dan9er" wrote: > PATH=/home/dan9er/bin:/home/dan9er/.local/bin:/usr/local/ > sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/ >

Re: [tesseract-ocr] error while loading shared libraries: libtesseract.so.4: cannot open shared object file: No such file or directory

2017-08-27 Thread ShreeDevi Kumar
Try sudo ldconfig -- type env to see your environment variables, including PATH -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] error while loading shared libraries: libtesseract.so.4: cannot open shared object file: No such file or directory

2017-08-27 Thread ShreeDevi Kumar
Do a search on libtesseract.so in your console.txt. See if the path where it has been installed is available when you run tesseract. Otherwise add it to your PATh environment variable. ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Re: Does unicharset affect recognition quality ?

2017-08-25 Thread ShreeDevi Kumar
/best/ in ~alex-p profile. >>> But I found kan.traineddata in package tesseract-lang-4.00 (in >>> tesseract-lang-3.05 the language Kannada is absent). >>> I have to got this file and start recognise - result is the same. >>> This package is dated at 08.01.1

Re: [tesseract-ocr] Re: Does unicharset affect recognition quality ?

2017-08-25 Thread ShreeDevi Kumar
>> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr >> >> For ppa >> >> On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <shree...@gmail.com> wrote: >> >>> Latest GitHub source in master branch is for 4.0alpha. you can install >>> via

Re: [tesseract-ocr] Dropped single character words

2017-08-25 Thread ShreeDevi Kumar
gt; >> https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr >> >> For the ppa >> >> On 25-Aug-2017 12:45 AM, "ShreeDevi Kumar" <shree...@gmail.com> wrote: >> >>> There is an unofficial ppa package available with latest code, if you do

Re: [tesseract-ocr] Re: Does unicharset affect recognition quality ?

2017-08-25 Thread ShreeDevi Kumar
https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr For ppa On 25-Aug-2017 5:22 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > Latest GitHub source in master branch is for 4.0alpha. you can install via > post. > > Search for tesseract PPA Alex in

Re: [tesseract-ocr] Dropped single character words

2017-08-25 Thread ShreeDevi Kumar
https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr For the ppa On 25-Aug-2017 12:45 AM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > There is an unofficial ppa package available with latest code, if you do > not want to build it. > > -- Excuse the

Re: [tesseract-ocr] Re: Does unicharset affect recognition quality ?

2017-08-25 Thread ShreeDevi Kumar
Latest GitHub source in master branch is for 4.0alpha. you can install via post. Search for tesseract PPA Alex in Google. _sent from phone On 25-Aug-2017 4:42 PM, "Yury" wrote: > Hello again. > > I found this: https://github.com/tesseract-ocr/tessdata/blob/ >

Re: [tesseract-ocr] Re: Does unicharset affect recognition quality ?

2017-08-25 Thread ShreeDevi Kumar
Have you tried the new tessdata/best/*.traineddata with the latest github sources? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] Dropped single character words

2017-08-24 Thread ShreeDevi Kumar
There is an unofficial ppa package available with latest code, if you do not want to build it. -- Excuse the brevity, msg sent from phone. On 25-Aug-2017 12:41 AM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > You can try building latest GitHub source for 4.0alpha and t

Re: [tesseract-ocr] Dropped single character words

2017-08-24 Thread ShreeDevi Kumar
You can try building latest GitHub source for 4.0alpha and test with the best/eng.traineddata from the tessdata repository. -- Excuse the brevity, msg sent from phone. On 25-Aug-2017 12:36 AM, "Clinton Graham" wrote: > Do you have any simple suggestions for improving OCR

Re: [tesseract-ocr] Error in Layout Analysis with Tesseract OCR 4.0.0alpha

2017-08-23 Thread ShreeDevi Kumar
Skipping words is issue from tesseract. Amit do has a proposed patch for it. Look in tesseract issues. You can see if it helps in your case. -- Excuse the brevity, msg sent from phone. On 23-Aug-2017 9:16 PM, "Nirajan Pant" wrote: > Yeah! I have tried both gimagereader

Re: [tesseract-ocr] Error in Layout Analysis with Tesseract OCR 4.0.0alpha

2017-08-23 Thread ShreeDevi Kumar
You could try doing your own layout analysis instead of relying o tesseract's auto mode? Have you tried gimagereader and vietocr as gui interface for tesseract for Nepali? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Re: [tesseract-ocr] The net_spec in the chi_sim.traineddata

2017-08-23 Thread ShreeDevi Kumar
Loaded file ./tess4training-save/tess4training-vedic/tessdata/best/Devanagari.lstm, unpacking... Warning: LSTMTrainer deserialized an LSTMRecognizer! Code range changed from 217 to 157!! Num (Extended) outputs,weights in Series: 1,48,0,1:1, 0 Num (Extended) outputs,weights in Series: C3,3:9,

Re: [tesseract-ocr] Re: Msg from Ray - Calling for community contribution for some languages

2017-08-23 Thread ShreeDevi Kumar
> yor.traineddata doesn't seem robust enough I have added as an issue - see https://github.com/tesseract-ocr/langdata/issues/89 > My project right now needs more training data to make the model more robust. It is very tough to find properly marked yoruba text on the internet. You can see if

Re: [tesseract-ocr] The net_spec in the chi_sim.traineddata

2017-08-23 Thread ShreeDevi Kumar
I think that number is ignored and the actual number generated from unichasrset is used. Usually there will be a message right at beginning of training showing the number being used. On 23-Aug-2017 12:21 PM, wrote: > Hello, > > I have pulled out the network of the

Re: [tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-22 Thread ShreeDevi Kumar
The files will be at Google. You have to wait till Ray Smith updates the repository. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 22, 2017 at 12:58 PM, wrote: > Thanks for your

Re: [tesseract-ocr] Training from scratch to re-train the chi_sim.traineddata for studying

2017-08-22 Thread ShreeDevi Kumar
The langdata files have not been updated for 4.00alpha ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 22, 2017 at 12:17 PM, wrote: > Hello, > > I'm trying to re-train the

Re: [tesseract-ocr] Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-21 Thread ShreeDevi Kumar
lstm file is the language model. It is saved in traineddata file. dawgs are a kind of compressed files, created from lists of words, punctuation or numbers. You can use dawg2wordlist to unpack them. Please follow the instructions on the training wiki page. -- You received this message because

Re: [tesseract-ocr] Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-21 Thread ShreeDevi Kumar
training/combine_tessdata -e tessdata/best/eng.traineddata \ ~/tesstutorial/impact_from_full/eng.lstm On 04-Aug-2017 12:03 PM, wrote: > Hello, > > I use the 'git pull' command to update the code from the link > https://github.com/tesseract-ocr/tesseract.git, and I

Re: [tesseract-ocr] where can i find chinese original training data for re-train tesseract 4.0

2017-08-18 Thread ShreeDevi Kumar
The lead developer of tesseract-ocr is Ray Smith (at Google). @theraysmith on github He is in the process of updating the files for 4.0.0 beta release soon. see https://github.com/tesseract-ocr/langdata/issues/35#issuecomment-320330996 ShreeDevi

Re: [tesseract-ocr] where can i find chinese original training data for re-train tesseract 4.0

2017-08-18 Thread ShreeDevi Kumar
langdata has NOT been updated for 4.0. Please wait for update from Ray. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 18, 2017 at 12:42 PM, <514358...@qq.com> wrote: > hi,all: > > I want to re-train

Re: [tesseract-ocr] Re: Improve the accuracy rate by training Tesseract4.0(LSTM), using fine tune

2017-08-18 Thread ShreeDevi Kumar
2017-08-18 12:48 GMT+05:30 <514358...@qq.com>: > chi_sim.traineddata is not for LSTM4.0 > > ​That is not correct. https://github.com/tesseract-ocr/tessdata/blob/master/best/chi_sim.traineddata ​ -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

2017-08-12 Thread ShreeDevi Kumar
With English you should probably get close to 99% accuracy. Is your png at 300 dpi? Which version of tesseract did you use? Which traineddata? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 12, 2017 at

Re: [tesseract-ocr] Creation of encoded unicharset failed While constructing LSTM training data.

2017-08-10 Thread ShreeDevi Kumar
​Seems to work fine for me. Are you sure that you have relevant files in the directories listed in that command? check tessdata, langdata location. Use tessdata/best/*.traineddata as the existing models.​ ShreeDevi भजन - कीर्तन -

Re: [tesseract-ocr] 4.0-training

2017-08-08 Thread ShreeDevi Kumar
The training instructions for 4.0 have changed. Please see the wiki. Which language are you trying to train? Have you tried the current tessdata/best/*.traineddata model? What's your feedback on those? ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110

2017-08-07 Thread ShreeDevi Kumar
You also need to provide a traineddata file as input Please review the updated training instructions in the wiki and change the training commands accordingly. On 07-Aug-2017 6:15 PM, "Ava Nimaee" wrote: > hi how can you solve it? i have this error too. > please help me

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-07 Thread ShreeDevi Kumar
,weights in Series: >>>> C3,3:9, 0 >>>> Ft16:16, 160 >>>> Total weights = 160 >>>> [C3,3Ft16]:16, 160 >>>> Mp3,3:16, 0 >>>> Lfys48:48, 12480 >>>> Lfx96:96, 55680 >>>> Lrx96:96, 74112 >>>&

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
/138 pages (1-138) of document ../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp0.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp-1.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob &

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
dic/san.AA_N >> AGARI_SHREE_L3.exp0.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp-1.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob >> e_Devanagari.exp-2

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
../tesstutorial/vedic/san.AA_N >> AGARI_SHREE_L3.exp-1.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob >> e_Devanagari.exp-2.lstmf >> Loaded 138/138 pages (1-138) of document ../tesstutorial/vedic/san.Adob >> e_Devanagari.exp1.lstmf >> >> >> ShreeDevi >> ___

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
../tesstutorial/vedic/san. Adobe_Devanagari.exp1.lstmf ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > did you build the training to

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
did you build the training tools again? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee wrote: > yes, you said me and i clone last tesseract-master and

Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata

2017-08-05 Thread ShreeDevi Kumar
Are you using the latest source of programs from github for building tesseract? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee wrote: > Hi > i used

Re: [tesseract-ocr] Failed to load list of training filenames from

2017-08-05 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tessdata/issues/70#issuecomment-320441568 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 5, 2017 at 6:22 PM, Ava Nimaee wrote: > we tried

Re: [tesseract-ocr] Failed to load list of training filenames from

2017-08-04 Thread ShreeDevi Kumar
Please try the ocr with new tessdata/best/far.traineddata - farsi - persian and provide your feedback for Ray to improve the training. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Aug 4, 2017 at 6:40 PM, Ava

Re: [tesseract-ocr] Failed to load list of training filenames from

2017-08-04 Thread ShreeDevi Kumar
​Please check tesseract training wiki for new instructions. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 Use the latest code from github.​ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On

Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread ShreeDevi Kumar
Ray has uploaded new traineddata files in https://github.com/tesseract-ocr/tessdata/tree/master/best Why don't you first try recognition with that ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 1, 2017 at

Re: [tesseract-ocr] ERROR: Could not find training text file

2017-07-31 Thread ShreeDevi Kumar
add a line similar to following to your training command, pointing to where you have your training text --training_text ../langdata/eng/eng.training_text \ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul

Re: [tesseract-ocr] Building tesseract 4.0.0 from master on OS X

2017-07-30 Thread ShreeDevi Kumar
I do not have a MAC so cannot check. But you can try option "with-training-tools", "Install OCR training tools" with homebrew install along with the --HEAD option. Please add a comment to existing mac OS issue on github, if you still face a problem. -- You received this message because you

Re: [tesseract-ocr] Building tesseract 4.0.0 from master on OS X

2017-07-30 Thread ShreeDevi Kumar
Also see https://github.com/Homebrew/homebrew-core/blob/master/Formula/tesseract.rb ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 31, 2017 at 9:32 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: &g

Re: [tesseract-ocr] Building tesseract 4.0.0 from master on OS X

2017-07-30 Thread ShreeDevi Kumar
Please see the following for the suggested solutions https://github.com/tesseract-ocr/tesseract/issues/864 Can't Install Latest Head With Brew https://github.com/tesseract-ocr/tesseract/issues/830 3.05 can't be be built as Standalone Self-contained Tesseract-OCR for Mac Regarding

Re: [tesseract-ocr] Re: Combining tessdata files Error opening unicharset file

2017-07-28 Thread ShreeDevi Kumar
You need to mv or rename the files with por. prefix then when you use combine_tessdata command it will use all por. files to create traineddata. see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh mv ${TRAINING_DIR}/inttemp

Re: [tesseract-ocr] Re: Combining tessdata files Error opening unicharset file

2017-07-27 Thread ShreeDevi Kumar
what command did you use? make sure that all components are there as listed. looks like only the unicharset was available for building your traineddata. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jul 27,

Re: [tesseract-ocr] Page segmentation and preserve_interword_space are not working

2017-07-26 Thread ShreeDevi Kumar
Try 'tsv' instead of 'hocr' ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jul 26, 2017 at 10:30 PM, Prav wrote: > Hi, > > I am trying to extract tabular data. For this I am converting the

Re: [tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428

2017-07-26 Thread ShreeDevi Kumar
Which version of tesseract are you using? Which platform? Try building the latest code from github and use that. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 25, 2017 at 9:02 PM, Ava Nimaee

Re: [tesseract-ocr] Could not find font named AR PL UMing Patched Light

2017-07-26 Thread ShreeDevi Kumar
I do not have this font. The training is done at Google. They probably use a number of commercial fonts in addition to freely available fonts. The fonts are not provided as part of the training data. You have to get your own set of fonts to train or wait for the new traineddata by Ray (expected

Re: [tesseract-ocr] Could not find font named AR PL UMing Patched Light

2017-07-25 Thread ShreeDevi Kumar
The training process uses the list of fonts from https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh You need to update it to match the fonts available with you for the script you are training and include the correct location for the fonts directory. ShreeDevi

Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-07-25 Thread ShreeDevi Kumar
That error is because some characters in your training text are not part of the unicharset of chi_sim. You are trying finetune training which will give error. Replace top layer will work. I suggest that you wait 2-3 weeks for Ray to upload new traineddata for all languages. You can tell us if

Re: [tesseract-ocr] Combine_tessdata command error while training Tesseract4.0

2017-07-24 Thread ShreeDevi Kumar
Is your traineddata file present at ../tessdata/nor.traineddata? Is it 4.00 version? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 24, 2017 at 1:47 PM, wrote: > Hello, > > I'm

Re: [tesseract-ocr] Using TessPDFRenderer in tesseract 3.05 in C++

2017-07-21 Thread ShreeDevi Kumar
take a look at tesseractmain.cpp . 352 api->GetBoolVariable ("tessedit_create_pdf", ); 353 if (b) { 354 bool textonly; 355

Re: [tesseract-ocr] Train tess4 LSTM with own images

2017-07-21 Thread ShreeDevi Kumar
currently lstm training is only supported for box/tiff pairs generated by text2image via tesstrain.sh script. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jul 21, 2017 at 12:55 PM, Sophea PRUM

Re: [tesseract-ocr] Using TessPDFRenderer in tesseract 3.05 in C++

2017-07-21 Thread ShreeDevi Kumar
Are you able to create pdfs using commandline? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jul 21, 2017 at 12:09 PM, Roger Jefferson < roger.t.jeffer...@gmail.com> wrote: > I want to use tesseract 3.05 to

Re: [tesseract-ocr] Re: train a new font for language of persian

2017-07-18 Thread ShreeDevi Kumar
I would suggest that you wait a few weeks more for Ray to upload the new traineddata files for tesseract4.0.0beta and then try it. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jul 19, 2017 at 10:30 AM, Ava

Re: [tesseract-ocr] tesseract 4 skips over some text

2017-07-18 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/681#issuecomment-303027906 You can try changing those constants to see if you get any improvement. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul

[tesseract-ocr] Fwd: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995)

2017-07-11 Thread ShreeDevi Kumar
​Forwarding update by Ray. -- Forwarded message -- From: theraysmith Date: Wed, Jul 12, 2017 at 5:55 AM Subject: Re: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995) To: tesseract-ocr/tesseract I'm about

Re: [tesseract-ocr] While extracting numbers tesseract makes a lot of errors

2017-07-09 Thread ShreeDevi Kumar
If using 3.05 branch try configs such as digits whitelist ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jul 9, 2017 at 7:36 PM, Prav wrote: > Any suggestions for any configuration which i

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-07 Thread ShreeDevi Kumar
​for 3.05 don't you need to checkout the 3.05 branch??​ master is for 4.0 development. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jul 7, 2017 at 9:22 PM, akhil katpally wrote: >

Re: [tesseract-ocr] Simple images, trying to get the better results

2017-07-05 Thread ShreeDevi Kumar
Try with a higher dpi for output images - 300 or 600. Also check out other psm values. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jul 5, 2017 at 8:13 AM, Marcos Benatti wrote: > Hello

Re: [tesseract-ocr] Re: Image file not found

2017-07-04 Thread ShreeDevi Kumar
see https://groups.google.com/forum/#!topic/tesseract-ocr/l918_ouIH98 https://groups.google.com/forum/#!topic/tesseract-ocr/hOvr20u71dY https://groups.google.com/forum/#!topic/tesseract-ocr/nr095u8w7iU -- You received this message because you are subscribed to the Google Groups

Re: [tesseract-ocr] Re: Image file not found

2017-07-02 Thread ShreeDevi Kumar
you can browse source code via doxygen at https://ub-mannheim.github.io/tesseract/a00113_source.html for page segmentation, follow the links. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jul 2, 2017 at 3:11 PM,

Re: [tesseract-ocr] Image file not found

2017-07-02 Thread ShreeDevi Kumar
These errors are from leptonica. The image processing within tesseract is limited. It is preferable to preprocess image before calling tesseract. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jul 2, 2017 at

Re: [tesseract-ocr] Errors on all commandline options

2017-06-29 Thread ShreeDevi Kumar
--psm works for 3.05.01 and 4.00.00alpha try -psm ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jun 29, 2017 at 8:20 PM, Brian wrote: > trying to run > > tesseract infile.tif outfile

Re: [tesseract-ocr] Re: Tesseract library 6.0.4 queries

2017-06-27 Thread ShreeDevi Kumar
>tesseract library version 6.0.4 Tesseract-ocr stable version is 3.05.01 and development branch is for 4.0. Are you referring to a different project that uses tesseract? For licensing, see https://github.com/tesseract-ocr/tesseract#license For performance, see

Re: [tesseract-ocr] Re: Need help training Simplified Chinese.

2017-06-26 Thread ShreeDevi Kumar
On Tue, Jun 27, 2017 at 10:18 AM, Clement wrote: > I downloaded the alpha source code from the link below: > https://github.com/tesseract-ocr/tesseract/releases/tag/4.00.00alpha > > I installed using the following commands: > $ ./autogen.sh > $ ./configure PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

Re: [tesseract-ocr] ./configure failling for me

2017-06-26 Thread ShreeDevi Kumar
Also see https://github.com/tesseract-ocr/tesseract/issues/919 related to building on Centos ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jun 27, 2017 at 8:54 AM, ShreeDevi Kumar <shreesh...@gmail.com>

Re: [tesseract-ocr] ./configure failling for me

2017-06-26 Thread ShreeDevi Kumar
Have you tried: ensure that autoconf-archive is installed. Don't forget to run ./autogen.sh after the installation of autoconf-archive. as per https://github.com/tesseract-ocr/tesseract/wiki/Compiling ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Re: Need help training Simplified Chinese.

2017-06-25 Thread ShreeDevi Kumar
>> I installed Tesseract 4.00alpha on Linux. How did you install it? Did you use the latest code from github? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jun 25, 2017 at 8:18 PM, Clement

Re: [tesseract-ocr] Trainer GUI for Tesseract version 4.0

2017-06-24 Thread ShreeDevi Kumar
Take a look at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 for an overview of training for 4.0. Follow the tutorials to get a feel of the training process - you can try for English as well as Malayalam. In terms of trainer GUI, I think that it will probably work for

Re: [tesseract-ocr] Trainer GUI for Tesseract version 4.0

2017-06-24 Thread ShreeDevi Kumar
You can update it for 3.05.01 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Jun 24, 2017 at 6:59 PM, Nalin Linux wrote: > I where developing a Tesseract trainer GUI which makes

Re: [tesseract-ocr] I am looking for the best way to OCR scan sports scoreboards (such as stadium scoreboards) for such items as time and scores

2017-06-23 Thread ShreeDevi Kumar
Take a look at https://www.unix-ag.uni-kl.de/~auerswal/ssocr/ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jun 22, 2017 at 11:58 PM, wrote: > I am experimenting with Tesseract >

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread ShreeDevi Kumar
>what is the number of the iterations that will for sure cover the 40 lstmf files? It will depend on number of lines in each file eg. If each file has 1000 lines, then 40,000 iterations should cover all files once. You can use --target_error_rate 0.01 instead of number of iterations as a

Re: [tesseract-ocr] Need help training Simplified Chinese.

2017-06-22 Thread ShreeDevi Kumar
Your best bet for improving recognition is to preprocess the small and medium images to larger size. Please see https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Tesseract 4.00.00alpha currently has two different ocr engines in it. The legacy tesseract engine is accessible with

Re: [tesseract-ocr] Re: unicharset_extractor extracting zero values

2017-06-20 Thread ShreeDevi Kumar
Master branch currently includes the legacy engine. So you can easily build your custom traineddata using the following command (modify it for your fonts location, training text, font name etc) training/tesstrain.sh \ --fonts_dir ~/.fonts \ --tessdata_dir ../tessdata \ --training_text

Re: [tesseract-ocr] bad result on tesseract(4.0) with lstm

2017-06-20 Thread ShreeDevi Kumar
Your input image quality needs to be improved. Also test with --oem 1 alone. Please test with https://github.com/tesseract-ocr/tesseract/blob/master/testing/hebtypo.jpg and see if you get similar results. for hocr, just adding hocr to the command line should work - as long as you have the hocr

Re: [tesseract-ocr] Re: unicharset_extractor extracting zero values

2017-06-20 Thread ShreeDevi Kumar
> Do you know why my tesseract isnt compiling ? I would really love a updated version on my ubuntu. Not sure. I haven't built 3.05 branch. For master, I follow the usual autotools method. Have you also built leptonica? Make sure you don't have any old leptonica version already. Make sure you

Re: [tesseract-ocr] Tesseract 4.00.00alpha Windows doesn't find image files

2017-06-20 Thread ShreeDevi Kumar
Please show the command line you used followed by the error. You may have to put filename in quotes if there are spaces in it. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 19, 2017 at 9:32 PM, J.

Re: [tesseract-ocr] How to improve the recognition of receipt (text not in words dictionary)

2017-06-20 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/960#issuecomment-305966719 on stable 3.0x you can try by adding your product catalog to eng.user-words file and check for improvement. In my unit test, it seemed to apply the words from user dict. Alternately, you can also try withthe

[tesseract-ocr] error building 3.05.01

2017-06-19 Thread ShreeDevi Kumar
Sorry, I haven't built 3.05.01. Hope others can help. On Tue, Jun 20, 2017 at 2:32 AM, David Barishev wrote: > hey, i try to build tesseract from source now, and after i have > built Leptonica, i couldn't build tesseract with this error : > > /bin/bash ../libtool

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
I would also suggest that you add spaces between words in your input text, ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 19, 2017 at 9:19 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > ​You c

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
On Mon, Jun 19, 2017 at 9:05 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Where do you have your source files for english langdata? > > If it is in a directory such as ../langdata/eng/ > then put the common.unicharset, latin.unicharset and font_properties etc

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
Where do you have your source files for english langdata? If it is in a directory such as ../langdata/eng/ then put the common.unicharset, latin.unicharset and font_properties etc in ../langdata ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
do u have the common and latin unicharset in ur langdata directory. See https://github.com/tesseract-ocr/langdata Try to build the latest 3.05.01 version. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 19,

Re: [tesseract-ocr] Newbie: Trying to scan IBM Selectric "script" typeface

2017-06-16 Thread ShreeDevi Kumar
Glad it worked for you. 4.0 LSTM version is still under active development. I am curious to know whether you 'cloned' the repository for latest version or used the source from https://github.com/tesseract-ocr/tesseract/releases. ShreeDevi

Re: [tesseract-ocr] Newbie: Trying to scan IBM Selectric "script" typeface

2017-06-16 Thread ShreeDevi Kumar
Which version of tesseract are you using? Using the latest code from github with eng.traineddata I get the following: tesseract AutoRedCow_01.png stdout --psm 3 --oem 1 -l eng Hi! My name is Cow,. Not just any kind of cow, but Cow spelled with a capital 'C'. No, 1 never had any other name - Zike

Re: [tesseract-ocr] large char set language training

2017-06-16 Thread ShreeDevi Kumar
Yes, there is a method for rendering synthetic training data from training_text and fonts via text2image program and tesstrain.sh script. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh

Re: [tesseract-ocr] How to regenerate the training text

2017-06-15 Thread ShreeDevi Kumar
You can also see https://ancientgreekocr.org/ for Nick White's method of creating training data for Ancient Greek. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jun 16, 2017 at 8:18 AM, ShreeDevi Kumar <shre

<    1   2   3   4   5   6   7   8   >