Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-14 Thread chandra churh chatterjee
How to convert the images as stated above into fonts for tesstrain.sh command runnning which generates images files along with box and .lstmf files? On Thu, Jun 14, 2018 at 11:05 AM chandra churh chatterjee < chandrachurh.chatterje...@gmail.com> wrote: > can you tell me from which dir

[tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-13 Thread chandra churh chatterjee
I have trained tesseract 3 with 64 fonts using respective box and .tr files, But now i want to use the same trained data for training tesseract 4 after creating the starter trained data using the "Using tesstrain The setup for running tesstrain.sh is the same as for base Tesseract. Use

Re: [tesseract-ocr] Can :traineddata" for Tesseract 3 be used for Tesseract 4

2018-06-13 Thread chandra churh chatterjee
ile.exp0.tif lang.file.exp0 lstm.train > > lstm.train is a config file. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > > On Wed, Jun 13, 2018 at 6:46 PM chandra churh chatterjee < &g

[tesseract-ocr] Check validity of box and image files

2018-07-03 Thread chandra churh chatterjee
We are trying to train tesseract 4 on hand written images and have generated the following types of images and their respective box files. We can't understand whether our box files are correct or not.Can any one please confirm? -- You received this message because you are subscribed to the

[tesseract-ocr] Re: Training tesseract for hand written letters

2018-06-20 Thread chandra churh chatterjee
What is the format of your dataset an what does it contain can you tell me the details plz as you mentioned above that you are training on tesseract 2.04 and i am trying to do a same work of hand written recognition using tesseract 4.0 and also would like to be informed about the volume of your

Re: [tesseract-ocr] combine_tessdata. Failed to read /usr/share/tesseract-ocr/tessdata/foo.traineddata

2018-07-29 Thread chandra churh chatterjee
keep the foo.traineddata inside the tessdata folder and then run the command. On Sun, Jul 29, 2018 at 5:00 AM wrote: > I am using a bash script to train LSTM model. I have the images and box > file. > > > My problem is the error returns when the command combine_tessdata > executed . also i

Re: [tesseract-ocr] tesseract does not recognize grey colored fonts in the images..

2018-08-01 Thread chandra churh chatterjee
Binarize the image and it might give a good solution. Chandra Churh Chatterjee On Sat, Jul 28, 2018, 8:30 PM Yogesh Sanchihar < yogesh.yogesh.sanchih...@gmail.com> wrote: > If we have a text not black, but light greyish. tesseract does not > recognize it. > > Any solutio

Re: [tesseract-ocr] What is the purpose of trained data files present under tessdata/script folder

2018-07-19 Thread chandra churh chatterjee
ng --oem 1 This command makes tesseract 4 use the eng.traineddata for evaluation. Chandra Churh Chatterjee On Thu, Jul 19, 2018, 11:59 PM Vikas Goel wrote: > After installing tesseract, there are trained data files present under > "C:\Program Files (x86)\Tesseract-OCR\tessdata" as well a

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-19 Thread chandra churh chatterjee
g only 0-9 digits in a random function , create such a text corpus and generate the starter trained . 3. Use the starter trained data to generate final traineed data after lstm training If you want a detailed description, I can supply you with a complete documentation of steps. Chandra Churh

Re: [tesseract-ocr] Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2018-07-19 Thread chandra churh chatterjee
90%-95% HIGHEST ACCURACY : 100% On Thu, Jul 19, 2018 at 4:02 PM Ramakant Kushwaha < ramakant.sing...@gmail.com> wrote: > Thanks @Chandra, I am beginner for this, Please help me with the complete > documentation. > > > On Thu, Jul 19, 2018 at 3:38 PM, chandra churh chatterjee &l

Re: [tesseract-ocr] Training Tesseract Arabic/Hindi Digits using JTessBoxEditor in window 10

2018-07-19 Thread chandra churh chatterjee
Bad box error might be due to the images that you are using to train in jtess box editor . Check the resolution of the images. Chandra Churh Chatterjee On Thu, Jul 19, 2018, 3:22 PM Marwa M. Khan wrote: > Hello, > >I am trying to train the Tesseract 4.0 with LTSM on Arabic/Hindi

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread chandra churh chatterjee
Excuse me @Shree Devi Kumar can you please tell me whether data for training tesseract 4.0 would be better if the data has images which have paragraphed hand written texts or single character based texts as follows On Wed, Jun 20, 2018 at 9:00 PM Shree Devi Kumar wrote: > You will have better

Re: [tesseract-ocr] java.lang.UnsatisfiedLinkError: The specified module could not be found.

2018-06-28 Thread chandra churh chatterjee
@Shree Devi Kumar , Can I get a complete detailed description of the Neural Network Architecture of the Tesseract 4 with diagram relating to what the net_spec command line of lstm training specifies. On Tue, Jun 26, 2018 at 1:42 PM Shree Devi Kumar wrote: > Please post in