Re: [tesseract-ocr] mftraining Segmentation fault error

2016-11-04 Thread ShreeDevi Kumar
onts then? >>> Or is this purely to be able to train using these fonts? >>> >>> Might there be another way to use the training for such a large amount >>> of fonts? >>> Can training the fonts into multiple language files then be the solution? >>> >>> >>>

Re: [tesseract-ocr] Failure to recognize columns

2016-10-13 Thread ShreeDevi Kumar
Try psm 6, also 11, 12 https://github.com/tesseract-ocr/tesseract/issues/434 On 13 Oct 2016 1:13 p.m., "fuzzy7k" wrote: > I tried psm 0-3 > > On Thursday, October 13, 2016 at 1:46:45 AM UTC-4, shree wrote: >> >> Which page segmentation mode (psm) did you try? >> >> On 12 Oct

Re: [tesseract-ocr] Failure to recognize columns

2016-10-12 Thread ShreeDevi Kumar
Which page segmentation mode (psm) did you try? On 12 Oct 2016 11:21 p.m., "fuzzy7k" wrote: > I have scanned some index pages that I would like to ocr for rapid > searching. I am using tesseract from the command line. The problem is that > tesseract ignores the whitespace

Re: [tesseract-ocr] Tesseract 4.0: VGSLSpecs

2016-12-16 Thread ShreeDevi Kumar
+ Ray Smith On 16-Dec-2016 10:58 PM, "Kay-Michael Würzner" wrote: > Yes, I did and in principle everything works like a charm which is great. > What I want to accomplish now is some understanding: Why do I have to set a > documented parameter in some undocumented way or to

Re: [tesseract-ocr] Tesseract 4.0: VGSLSpecs

2016-12-16 Thread ShreeDevi Kumar
Did you try out the commands as per the LSTM training tutorial? On 16-Dec-2016 8:31 PM, "Kay-Michael Würzner" wrote: > Dear @, > > I played around with training the new LSTM mode. According to the > documentation of the network specification (https://github.com/tesseract- >

Re: [tesseract-ocr] Re: pdf -> searchable PDF

2017-01-13 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/83 and other PDF related issues in GitHub repo with similar discussion. - excuse the brevity, sent from mobile On 13-Jan-2017 10:15 PM, "James R Barlow" wrote: > Tesseract cannot rasterize PDFs. It is fairly

Re: [tesseract-ocr] LSTM training error after some iterations

2017-01-14 Thread ShreeDevi Kumar
Try without the following line. --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Jan 14, 2017 at 3:47 AM, wrote: > I tried to

Re: [tesseract-ocr] LSTM training error after some iterations

2017-01-14 Thread ShreeDevi Kumar
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Jan 14, 2017 at 6:14 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Try without the following line. > > --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \

[tesseract-ocr] Ground Truth from Box Files

2017-01-06 Thread ShreeDevi Kumar
Does anyone know of any utilities to convert a box file to ground truth text file? I am using tesstrain.sh which uses text2image for trying out LSTM training. However, because unrenderable words are not included in the tifs, it is not possible to use the training_text as ground truth. Thanks!

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-06 Thread ShreeDevi Kumar
the traineddata for norlayer0.853_1615.lstm i.e. 0.853 % character error rate at iteration number 1615. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jan 6, 2017 at 5:59 PM, ShreeDevi Kumar <shreesh...@gmail.com>

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-04 Thread ShreeDevi Kumar
Ray is planning to retrain the languages for the new 4.0.0 version sometime in January. So it would be helpful if you could open an issue on https://github.com/tesseract-ocr/langdata/issues with this information. Also, if you can provide a sample representative Norwegian text including Æ, I will

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-05 Thread ShreeDevi Kumar
I will give it a try and let you know. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send

[tesseract-ocr] Re: Swedish language

2017-01-08 Thread ShreeDevi Kumar
at 9:36 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Peter, > > Please see https://github.com/tesseract-ocr/langdata/blob/master/swe/ > swe.training_text > > You can provide additional training text if some needed characters are > missing in the abov

Re: [tesseract-ocr] How should Vs2015 solve this problem ?

2016-12-28 Thread ShreeDevi Kumar
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Dec 29, 2016 at 12:26 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Please rebuild leptonica with the latest source from github ( > https://github.com/DanBloomberg/leptonica) > and then re

Re: [tesseract-ocr] How should Vs2015 solve this problem ?

2016-12-28 Thread ShreeDevi Kumar
Please rebuild leptonica with the latest source from github ( https://github.com/DanBloomberg/leptonica) and then rebuild tesseract with the latest source from github ( https://github.com/tesseract-ocr/tesseract) and try. ShreeDevi भजन

Re: [tesseract-ocr] Again: read_params_file: parameter not found: II*

2017-01-01 Thread ShreeDevi Kumar
What about osd.traineddata and config files? Are they in your tessdata directory? - excuse the brevity, sent from mobile On 01-Jan-2017 9:22 PM, wrote: > Hi all, > > I'm in a time critical situation. I want to deliver a new software for our > customer on 5th

Re: [tesseract-ocr] Again: read_params_file: parameter not found: II*

2017-01-01 Thread ShreeDevi Kumar
Is TESSDATA _PREFIX variable set in the environment? If so, what is the directory, it is pointing to? - excuse the brevity, sent from mobile On 01-Jan-2017 9:35 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > What about osd.traineddata and config files? Are th

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-08 Thread ShreeDevi Kumar
PC froze so >> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. >> 0.853 % character error rate at iteration number 1615. >> >> >> ShreeDevi >> ____ >> भजन - कीर्तन - आरती @ http://bhaj

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-09 Thread ShreeDevi Kumar
ta >>>> >>>> See attached log and info file for commands used in training. It took >>>> about 9 hours on my pc - about 1700 iterations only and then my PC froze so >>>> I rebooted and created the traineddata for norlayer0.853_1615.lstm i.e. >>&g

Re: [tesseract-ocr] Re: Tesseract v3.03 and norwegian language

2017-01-05 Thread ShreeDevi Kumar
Tried 'Finetune' - that does not help in addition of a character. Trying 'Add a layer' now. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jan 5, 2017 at 8:59 PM, Ludvig F Aarstad wrote: >

[tesseract-ocr] Swedish language

2017-01-06 Thread ShreeDevi Kumar
Peter, Please see https://github.com/tesseract-ocr/langdata/blob/master/swe/swe.training_text You can provide additional training text if some needed characters are missing in the above. I can do a test training with it. - excuse the brevity, sent from mobile On 06-Jan-2017 5:01 PM, "Peter"

Re: [tesseract-ocr] unpack [lang].traineddata

2016-12-19 Thread ShreeDevi Kumar
combine_tessdata -u ara.traineddata ara. On 19-Dec-2016 1:57 PM, "universal reseller" wrote: > ​this is not a zip file..​ > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop

Re: [tesseract-ocr] Re: tesseract installed but cannot be found in cmd console (win7)

2016-12-21 Thread ShreeDevi Kumar
You also need to add the location of tesseract binaries to PATH. - sent from mobile phone On 22-Dec-2016 9:50 AM, "Junmock Lee" wrote: > How To Add/Edit Environment Variables in Windows 7 > https://www.nextofwindows.com/how-to-addedit-environment- >

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar
There might be some problem with your input file - all the following work for me. Please note that whitelist has no effect in 4.0 $ tesseract input.tif input Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1 $ tesseract input.tif input --psm 7 Tesseract Open Source OCR Engine

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
The initial 4.0alpha tag from November has cube in it. It was deleted later and is no longer in master. In fact, the OEM code for LSTM was originally 4 and now is 2. Shouldn't semantic versioning require tagging at major updates? - excuse the brevity, sent from mobile On 22-Mar-2017 8:58 PM,

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance - excuse the brevity, sent from mobile On 22-Mar-2017 8:58 PM, "universal reseller" wrote: > ​how did you used cube engine on tesse 4 !? > > -- > You received this message because you are

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
Sorry, mentioned incorrect code for LSTM OCR Engine modes: 0Original Tesseract only. 1Neural nets LSTM only. 2Tesseract + LSTM. 3Default, based on what is available - excuse the brevity, sent from mobile On 22-Mar-2017 9:02 PM, "ShreeDevi Kumar" <shreesh

Re: [tesseract-ocr] Having issue with Italic characters

2017-03-24 Thread ShreeDevi Kumar
Use Tesseract 4.0.0alpha and --oem 1 for LSTM. It works ok with that. --oem 0 with legacy engine gives / instead of i. you could test to see if a better dpi image(300 dpi) works with the legacy engine. ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Re: How to download the Tesseract trained data for Digital display numbers ( Seven Segments Data trained data )

2017-03-27 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/AddOns has link to traineddata for digital seven fonts. https://github.com/arturaugusto/display_ocr You can download various digital seven fonts, create traineing data images and train - all in Jtessboxeditor. Use 3.0x version ShreeDevi

Re: [tesseract-ocr] Low Accurate ini bold font

2017-03-27 Thread ShreeDevi Kumar
Try latest version of tesseract - build from master. Use --psm 7 --oem 1 I get correct result for both. tesseract unnamed1.png unnamed1 --psm 7 --oem 1 Tesseract Open Source OCR Engine v4.00.00alpha-347-g60c8b12 with Leptonica Warning. Invalid resolution 0 dpi. Using 70 instead. ShreeDevi

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar
what version of tesseract are you running? If you built it, which commit source have you used? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 23, 2017 at 4:28 PM, Jenkar Smithy wrote:

Re: [tesseract-ocr] Can't run tesseract with LSTM

2017-03-23 Thread ShreeDevi Kumar
Ok. I am using an older version ... git log -1 commit 0ff26ee3de166659970d80e50aef4000ff2557b2 Author: zdenop Date: Fri Feb 3 08:15:15 2017 +0100 Merge pull request #698 from stweil/configure configure: Run AVX test only with 64 bit compiler Please try with that.

Re: [tesseract-ocr] How to create a PDF ?

2017-03-23 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage also check that u have pdf.ttf in your tessdata folder https://github.com/tesseract-ocr/tesseract/tree/master/tessdata tesseract --tessdata-dir ./ ./testing/eurotext.png ./testing/eurotext-eng -l eng pdf ShreeDevi

Re: [tesseract-ocr] How to create a PDF ?

2017-03-23 Thread ShreeDevi Kumar
in https://github.com/tesseract-ocr/tesseract/tree/master/tessdata ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 23, 2017 at 7:04 PM, Saliaj Adrian wrote: > No I don't have pdf.ttf in my

Re: [tesseract-ocr] Tesseract 4 LSTM vs TesseractAndCube performance

2017-03-22 Thread ShreeDevi Kumar
March 22, 2017 at 12:04:24 PM UTC-4, shree wrote: >> >> Sorry, mentioned incorrect code for LSTM >> >> OCR Engine modes: >> 0Original Tesseract only. >> 1Neural nets LSTM only. >> 2Tesseract + LSTM. >> 3Default, base

[tesseract-ocr] Re: seven segment display - 4.0 traineddata

2017-03-29 Thread ShreeDevi Kumar
FYI - this was trained using eng.traineddata and finetuned with 7segment fonts. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Mar 29, 2017 at 9:09 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: >

[tesseract-ocr] seven segment display - 4.0 traineddata

2017-03-29 Thread ShreeDevi Kumar
Hi, I have built a 4.0 traineddata using some seven segment display fonts. Trained mostly on numbers 0-9, capital letters A-Z, : etc. It is uploaded as a zip file at https://github.com/Shreeshrii/tessdata4alpha/raw/master/ssd1.zip unzip to get ssd1.traineddata ​I have not tested it much.

Re: [tesseract-ocr] Invalid resolution 0 dpi. Using 70 instead.

2017-03-29 Thread ShreeDevi Kumar
The problem is with the input image. It does not have correct information about dpi. Please preprocess image to 300 dpi for better output. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Mar 29, 2017 at 8:40 AM,

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-25 Thread ShreeDevi Kumar
Added link in wiki - https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM @THintz, please fix your readme file, >cd \petri mkdir Win64 cd Win64 git clone https://github.com/tesseract-ocr/tesseract tesseract cd tesseract cppan (I assume this wasn't necessary, but I'm trying to avoid

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-16 Thread ShreeDevi Kumar
Egor (cc:ed) can provide guidance regarding cppan and cmake. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 16, 2017 at 6:30 PM, THintz wrote: > I spoke too soon. Apparently I touched

Re: [tesseract-ocr] tesseract multiply .png files to singular .txt file

2017-03-16 Thread ShreeDevi Kumar
Gui front-end for tesseract such as Vietocr and gimagereader will also allow for batch processing of multiple files. - excuse the brevity, sent from mobile On 16-Mar-2017 9:13 PM, "Lako" wrote: > Hi, > > Apologies for the beginner question, unfortunately I am fairly

Re: [tesseract-ocr] tesseract multiply .png files to singular .txt file

2017-03-16 Thread ShreeDevi Kumar
Please inform what environment you are running in, Linux, windows, etc. Basically, you need to to setup a loop which will process all .PNG files and concatenate the OCR results. - excuse the brevity, sent from mobile On 16-Mar-2017 9:13 PM, "Lako" wrote: > Hi, > >

Re: [tesseract-ocr] First time user

2017-03-20 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage On windows Tesseract.exe loc.tif loc Make sure tesseract.exe binary is in PATH and that tessdata_prefix variable points to where u have the traineddata files. - excuse the brevity, sent from mobile On 20-Mar-2017 11:22 AM,

Re: [tesseract-ocr] Re: New beginner

2017-03-21 Thread ShreeDevi Kumar
Make sure your input file phototest.tiff is in C:\Program Files\Tesseract-OCR Otherwise give full path to file. Main error is image file not found ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Mar 21, 2017

Re: [tesseract-ocr] Re: tesseract4 x64 Windows dlls?

2017-03-15 Thread ShreeDevi Kumar
Thanks for sharing how you made the x64 solution for Visual Studio. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Mar 15, 2017 at 9:44 PM, THintz wrote: > I follow the github instructions

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-16 Thread ShreeDevi Kumar
You did not mention from where you installed leptonica and tesseract. what info do you see when you type tesseract -v ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 16, 2017 at 2:21 PM, Kazi Moinul Hossain

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-16 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/233 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 16, 2017 at 2:41 PM, Kazi Moinul Hossain wrote: > Tesseract

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-20 Thread ShreeDevi Kumar
_ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Mar 17, 2017 at 7:09 PM, ShreeDevi Kumar <shree...@gmail.com> >> wrote: >> >>> try >>> >>> sudo apt-get remove libleptonica-dev >>> >>> ShreeDe

Re: [tesseract-ocr] Recognition of trademark symbol

2017-03-17 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/654#issuecomment-274574951 for more details about LSTM training. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 13, 2017 at 8:35 PM, Martin

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
​>Is there anything more you did in the "src" and "prog" directory under leptonica folder like "make allheaders", "make xtractprotos"? No.​ -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
I use the following batch files in the folders where I have cloned tesseract and leptonica. 1. leptonica #!/bin/bash git pull origin ./autobuild #./configure --disable-dependency-tracking ./configure make sudo make install sudo ldconfig cd prog make cd .. 2. tesseract #!/bin/bash ./autogen.sh

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > I use the following batch files in the folders where I have cloned > tesseract and leptonica. > > 1. leptonica > > #!/bin/bash > git pull origin > ./autobuild > #./configure --disable-dependency-tracking > ./

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
Also you have not responded to zdenko's suggestion to provide output of ldd tesseract or ldd /usr/local/bin/tesseract (use the location of tesseract, which you can find by which tesseract) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
try sudo apt-get remove libleptonica-dev ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Mar 17, 2017 at 6:06 PM, Kazi Moinul Hossain wrote: > how can i uninstall old leptonica fully? I

Re: [tesseract-ocr] Compilation problem for tesseract 4.00.00

2017-03-17 Thread ShreeDevi Kumar
sudo apt-get remove libleptonica-dev libleptonica ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Mar 17, 2017 at 7:09 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > try > > sudo apt-get remo

Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2017-04-03 Thread ShreeDevi Kumar
You need to get vietocr 5.0 alpha for tesseract 4.0 alpha https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/ https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/ ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar
Read https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread ShreeDevi Kumar
4.0 is alpha software. Please use an older released version. - excuse the brevity, sent from mobile On 05-Apr-2017 1:55 PM, wrote: > After u have said, > > I tried in two ways and i am stuck at lstm step: > > Training > > command used: > >

Re: [tesseract-ocr] Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-05 Thread ShreeDevi Kumar
Have you tried just using the eng.traineddata directly with tess 3.04/ 3.05 / 4.0? You don't need to train unless it is a very special case. You can try changing the dictionary dawg files with tess 3.0x. ShreeDevi भजन - कीर्तन -

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread ShreeDevi Kumar
You do not have the LSTM.train config file. - excuse the brevity, sent from mobile On 05-Apr-2017 1:55 PM, wrote: > After u have said, > > I tried in two ways and i am stuck at lstm step: > > Training > > command used: > >

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh -- You received this message because you are

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread ShreeDevi Kumar
Tesstrain.sh generates a file called eng.training_files.txt You are using command without .text extension Check the name of generated file and use that. I have found that editing that file also gives errors. - excuse the brevity, sent from mobile On 04-Apr-2017 7:01 PM,

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-03 Thread ShreeDevi Kumar
Saurabh, It depends on what you want to do with the bash script. Here is a sample of a script I used to compare results using diff tessdata files by looping thru a set of image files. Google the bash commands to figure out what they do! #!/bin/bash set -vx export

Re: [tesseract-ocr] Error while creating training data for Japanese

2017-04-03 Thread ShreeDevi Kumar
jpn.config in langdata/jpn is loading jpn_vert as a sublanguage tessedit_load_sublangs jpn_vert You can try without that Also look at the settings for jpn in training/language_specific.sh You may need to change the following also .. # The following fonts will be rendered vertically in phase

Re: [tesseract-ocr] Low Accurate ini bold font

2017-03-31 Thread ShreeDevi Kumar
Did you build it with debug option? That number refers to the git revision of the code, so it is easy to know what version of source commit it refers to. Look in github for commit that begins with that number. ShreeDevi भजन - कीर्तन -

Re: [tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-12 Thread ShreeDevi Kumar
You can use jtessboxeditor to edit the box files. Make sure to mark EOL if you are trying to train using scanned images. Also note that this part of code is untested - training 4.0 using pre-existing images and box files. Ray has only explained method for using images created by text2image.

Re: [tesseract-ocr] Help in TrainingTesseract 4.00 Finetune

2017-04-12 Thread ShreeDevi Kumar
--linedata-only means that it will only try to create lstmf files and not the files for 3.0x traing - excuse the brevity, sent from mobile On 12-Apr-2017 10:39 AM, "Ahmad Moawad" wrote: > Hello All, > > I want help in trainingTesseract 4.00 Finetune >

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Lstm training is not like legacy training. Please read the wiki pages regarding 4.0 training. I have given all sample commands there. There are 3 different ways of training. Read the bash scripts regarding training to know more. tesstrain.sh with --linedata-only creates the box tiff pairs but

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Read the bash scripts in tesstrain.sh tesstrain_utils.sh language_specific.sh In training directory To understand more detail about lstm training - excuse the brevity, sent from mobile On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote: > this is the part from

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh if ((LINEDATA)); then phase_E_extract_features "lstm.train" 8 "lstmf" make__lstmdata else phase_E_extract_features "box.train" 8 "tr" phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto" if

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Arabic was never trained with the legacy tesseract engine and I doubt you will get any improvement over existing traineddata using cube or lstm. You are free to experiment and see what you come up with. I have pointed to the bash scripts for training. Please refer to them for the correct

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage Follow correct order of variables tesseract imagename|stdin outputbase|stdout [options...] [configfile...] ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] How to add Armenian language support to tesseract

2017-04-11 Thread ShreeDevi Kumar
I have added this at https://github.com/tesseract-ocr/langdata/issues/67 Please add more information there: Which language code - arm or hye Modern Armenian or Classical Armenian Sources for primary texts in unicode the Armenian language to use for training Freely available unicode fonts to

Re: [tesseract-ocr] Tesseract Installation

2017-04-11 Thread ShreeDevi Kumar
You can ignore it. I get it too when using sudo 2nd time. Host name must be the id for your computer under windows10. Have u tried running tesseract after that? - excuse the brevity, sent from mobile On 11-Apr-2017 4:10 PM, "Ibr" wrote: Hi, I'm trying to install the

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-11 Thread ShreeDevi Kumar
Also, if you want training tools, you need to build them separately - see https://github.com/tesseract-ocr/tesseract/wiki/Compiling make training sudo make training-install ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar
Please open as issue, as problem related to --psm 0. - excuse the brevity, sent from mobile On 13-Apr-2017 9:29 AM, "Pritam Dodeja" wrote: > Find below - I can also ship my docker container to you if you want so you > can see my exact setup, it's about 1.15GB > >

Re: [tesseract-ocr] Training tesseract-ocr unicharset_extractor, mftraining, cntraining

2017-04-21 Thread ShreeDevi Kumar
If you want to OCR an invoice like the sample you posted, just use the eng.traineddata and OCR the page. You do not need to do any training. Here is the output I get 8633 0410 NO RP 11 07122015 NYNN 01 01 0001 Page 2 Of 3 Did you know? Your Comcast Business Internet service gives

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar
I haven't built 3.05 so cannot help. I would suggest that you try with older commits of tesseract 3.05 branch to see which one works. Hope that those who have built 3.05 on mac will help. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-19 Thread ShreeDevi Kumar
You can check that these are installed by entering the following which text2image The above will show u the location it is installed If you don't have training tools, you will need to build them separately - see https://github.com/tesseract-ocr/tesseract/wiki/Compiling make training sudo make

Re: [tesseract-ocr] Re: issue with simple reading of numbers 9 and 8

2017-04-23 Thread ShreeDevi Kumar
362b68e) ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Apr 23, 2017 at 9:25 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > Try training using more samples of 8, 9, B etc. > > What res

Re: [tesseract-ocr] Re: issue with simple reading of numbers 9 and 8

2017-04-22 Thread ShreeDevi Kumar
Try training using more samples of 8, 9, B etc. What results do you get with the provided eng.traineddata? Are they better or worse? Have you tried changing DPI of image to 300? - excuse the brevity, sent from mobile On 22-Apr-2017 10:29 PM, "James Abney" wrote: > Oh yes

Re: [tesseract-ocr] Caching in TrainLineRecognizer?

2017-03-10 Thread ShreeDevi Kumar
I have added it as an issue. Please see https://github.com/tesseract-ocr/tesseract/issues/754 You may want to create a pull request, if you have a solution. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Mar 5,

Re: [tesseract-ocr] Tesseract 4's LSTM classifier

2017-03-08 Thread ShreeDevi Kumar
The only public information regarding LSTM that has been shared by Google/Ray is linked from the following pages: https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016

Re: [tesseract-ocr] Major changes between stable 3.04.01 and 4.0

2017-03-02 Thread ShreeDevi Kumar
Also see https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 2, 2017 at 8:46 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote:

Re: [tesseract-ocr] Major changes between stable 3.04.01 and 4.0

2017-03-02 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/blob/master/ChangeLog https://github.com/tesseract-ocr/tesseract/releases https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-03-02 Thread ShreeDevi Kumar
screenshot of warning means that your image does not have resolution info. Your OCR output file should have been created. Training 4.0 is not easy. Please see https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM ShreeDevi भजन

Re: [tesseract-ocr] Tesseract 4.0 doesn't see the changes after Arabic traning

2017-04-08 Thread ShreeDevi Kumar
Arabic traineddata for 3.0x uses cube engine. Training process for that was never shared. Now the cube engine has been removed for lstm 4.0, which is still in alpha stage. There is 4.0alpha traineddata for Arabic and you can train for it , but accuracy is not great. Ray is doing another training

Re: [tesseract-ocr] Read 2 column Image Horizontally (line by line) rather than Vertically (column by column)

2017-04-06 Thread ShreeDevi Kumar
Have u tried --psm 6 - excuse the brevity, sent from mobile On 06-Apr-2017 11:06 PM, "Mike Hall" wrote: > We have a C# .Net app that is using Tesseract to do Optical Character > Recognition (OCR) on .tiff files. I've attached a sample tiff file. > > We are then

Re: [tesseract-ocr] (Advise needed) Command Output Fails and gives error in Tesseract 4 during fine tuning

2017-04-06 Thread ShreeDevi Kumar
You must be using an old version of traineddata which does not have LSTM. - excuse the brevity, sent from mobile On 07-Apr-2017 2:13 AM, wrote: > I am following this link https://github.com/tesseract-ocr/tesseract/wiki/ > TrainingTesseract-4.00---Finetune > > For genaerating

Re: [tesseract-ocr] Read 2 column Image Horizontally (line by line) rather than Vertically (column by column)

2017-04-06 Thread ShreeDevi Kumar
Normally, for text output, the other config files should not impact. - excuse the brevity, sent from mobile On 07-Apr-2017 2:18 AM, "Mike Hall" wrote: > Yes, we are using the -psm 6 command line argument. And it was not > working. > > But I figured out the issue. > >

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar
Use latest version of leptonica - 1.74.1 https://github.com/DanBloomberg/leptonica ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Apr 17, 2017 at 8:18 PM, Peter Reid wrote: > I've done

Re: [tesseract-ocr] Re: Standalone Self-contained Tesseract-OCR for Mac

2017-04-18 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/wiki/Compiling If you are building tesseract 4.0, you need Lept 1.74 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 18, 2017 at 2:25 PM, Peter Reid

Re: [tesseract-ocr] ERROR: Could not find training text file

2017-07-31 Thread ShreeDevi Kumar
add a line similar to following to your training command, pointing to where you have your training text --training_text ../langdata/eng/eng.training_text \ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul

Re: [tesseract-ocr] Re: Combining tessdata files Error opening unicharset file

2017-07-28 Thread ShreeDevi Kumar
You need to mv or rename the files with por. prefix then when you use combine_tessdata command it will use all por. files to create traineddata. see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh mv ${TRAINING_DIR}/inttemp

Re: [tesseract-ocr] "Can't encode transcript" error when using "lstmtraining" command with Tess4.0

2017-08-01 Thread ShreeDevi Kumar
Ray has uploaded new traineddata files in https://github.com/tesseract-ocr/tessdata/tree/master/best Why don't you first try recognition with that ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Aug 1, 2017 at

Re: [tesseract-ocr] Creation of encoded unicharset failed While constructing LSTM training data.

2017-08-10 Thread ShreeDevi Kumar
​Seems to work fine for me. Are you sure that you have relevant files in the directories listed in that command? check tessdata, langdata location. Use tessdata/best/*.traineddata as the existing models.​ ShreeDevi भजन - कीर्तन -

Re: [tesseract-ocr] Newbie: wondering why a fairly crisp document has such low accuracy

2017-08-12 Thread ShreeDevi Kumar
With English you should probably get close to 99% accuracy. Is your png at 300 dpi? Which version of tesseract did you use? Which traineddata? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Aug 12, 2017 at

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-07 Thread ShreeDevi Kumar
​for 3.05 don't you need to checkout the 3.05 branch??​ master is for 4.0 development. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jul 7, 2017 at 9:22 PM, akhil katpally wrote: >

[tesseract-ocr] Fwd: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995)

2017-07-11 Thread ShreeDevi Kumar
​Forwarding update by Ray. -- Forwarded message -- From: theraysmith Date: Wed, Jul 12, 2017 at 5:55 AM Subject: Re: [tesseract-ocr/tesseract] Tag a new version for LSTM 4.0 (#995) To: tesseract-ocr/tesseract I'm about

Re: [tesseract-ocr] While extracting numbers tesseract makes a lot of errors

2017-07-09 Thread ShreeDevi Kumar
If using 3.05 branch try configs such as digits whitelist ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jul 9, 2017 at 7:36 PM, Prav wrote: > Any suggestions for any configuration which i

<    1   2   3   4   5   6   7   8   >