[tesseract-ocr] Re: What can be done to improve the accuracy of extract

2017-07-17 Thread srnsp92
Hello Tom, So, if for particular case is considered and to be trained.. then i should use one training text with lot of fonts (or) so much of training text(so many copies of training text in one file one by one) with only one font. What would be the ideal choice then,... Can you please tell

[tesseract-ocr] Re: How to improve the recognition of receipt (text not in words dictionary)

2017-07-13 Thread srnsp92
Hello laura, can you please tell me, have you have achieved this or not. Iam alos trying to do same thing , and if yes, can you please give any advise. On Tuesday, June 20, 2017 at 12:05:25 PM UTC+5:30, Laura wrote: > > Hi, I’m new on tesseract. I’m trying to recognize receipts. Since on >

[tesseract-ocr] Advise to train tesseract for a string (with some pattern )

2017-07-13 Thread srnsp92
Advise to train tesseract for a string (with some pattern ) There are 3 procedures to train tesseract, that we can use for training tesseract, they explain about training tesseract nd finding a word and identify text. - Training From Scratch

[tesseract-ocr] Tesseract - strange error when trying to trian tesseract

2017-07-12 Thread srnsp92
Hello i am trying to train tesseract and giving the training files and other params, but i am getting a strange error and not able to get , what it means Please see tess_train_text( for more details) And after, i use the eng.traineddata and extract text of image, iam getting the same error..

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Sorry, I have given wrong commands for arabic. Actually i was referring to english. tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train unicharset_extractor eng.arial.exp4.box echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations about the font mftraining -F

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Hello shree, Thank you for your valuable reply.. Are there any changes i need to follow for the steps below.. I request you to suggest the changes for the below commands, these are for tess 3.0 tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train unicharset_extractor ara.arial.exp4.box

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-12 Thread srnsp92
I am able to train the tesseract with fine tuning technique with some training text (not images).. and i want to know how train tesseract with images and box files.. I am getting confused because when i give this tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train command, tr files

[tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread srnsp92
Can u tell when did you got his, means with the usage of which command did ypou get this error and at at which step..? On Wednesday, April 12, 2017 at 12:16:54 PM UTC+5:30, Pritam Dodeja wrote: > > Hi, > > I get segmentation faults when using page segmentation mode 0. Has anyone > else

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Can you please tell, whether the command -> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train is right or not for tesseract 4. As it is producing .tr files when i give this command in tesseract 4. for image files training On Wednesday, April 12, 2017 at 2:19:24 PM UTC+5:30, shree

[tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-12 Thread srnsp92
Can you please tell me how to split box and and merge two boxes respectively. I am not able to find any options regarding this. If you specify, it will be helpful to me and others also. Thank You. On Tuesday, April 11, 2017 at 9:10:14 AM UTC+5:30, Quan Nguyen wrote: > > For Case 1, you'll need

[tesseract-ocr] Please help ... Error when giving eval list and training file

2017-04-07 Thread srnsp92
command used: /home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Downloads/trn1/trntxt/lstmtraining/eng.unicharset \--script_dir /home/p/Documents/Tvat/TESS_4_ALPHA/langdata-master \--model_output /home/p/Downloads/mytext/output/base \--net_spec '[1,0,0,1 Ct5,5,16 Mp3,3

Re: [tesseract-ocr] (Advise needed) Command Output Fails and gives error in Tesseract 4 during fine tuning

2017-04-06 Thread srnsp92
Thank you Shree devi.. God bless you.. Its exactly the solution what i needed. But, May i know how you got hold of all these things on tesseract.. On Friday, April 7, 2017 at 6:38:31 AM UTC+5:30, shree wrote: > > You must be using an old version of traineddata which does not have LSTM. > > -

[tesseract-ocr] (Advise needed) Command Output Fails and gives error in Tesseract 4 during fine tuning

2017-04-06 Thread srnsp92
I am following this link https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune For genaerating the files for fine tuning command used (for Reference): combine_tessdata -e ../tessdata/ara.traineddata \ ~/tesstutorial/aratuned_from_ara/ara.lstm command used

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread srnsp92
Hello ShreeDevi, I solved this error lstm.train, i have given wrong path. mkdir -p ~/tesstutorial/engoutput training/lstmtraining *-U ~/tesstutorial/engtrain/eng.unicharset \* --script_dir ../langdata --debug_interval 100 \* --net_spec '[1,36,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread srnsp92
Please tell and help me how can i get LSTM.train config file.. as i need to work on Tesseract 4 only... dont have other option On Wednesday, April 5, 2017 at 1:59:56 PM UTC+5:30, shree wrote: > > You do not have the LSTM.train config file. > > - excuse the brevity, sent from mobile > > On

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-05 Thread srnsp92
You can use *.* when identifying the files.. but you should be careful only image files are only supplied... as it can take all available files, because * means it takes input for all the files. 1)I request you can help me with posts i had posted today.. 2) And please guide how can i generate

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread srnsp92
Overview of Training Process The overall training process is similar to training 3.04 Conceptually the same: 1. Prepare training text. 2. Render text to image + box file. (Or create hand-made box files for existing

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-05 Thread srnsp92
After u have said, I tried in two ways and i am stuck at lstm step: Training command used: /home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Documents/T/img_frm_3/eng.unicharset \ > --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master --debug_interval 100 \ >

[tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-04 Thread srnsp92
Can you please post some experiences in this post, as there are no posts to train tesseract 4. 1)And also, is there any way to add the new trained data file to old trained data file, without replacing the old file. 2)If we dont know what font we may get in our images, then how should we

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread srnsp92
I am trying to tesseract 4,, and i am getting folowing error,, command used: mkdir -p /home/p/Documents/T/engoutput /home/p/Documents/T/tesseract-master/training/lstmtraining -U /home/p/Documents/T/img_frm_3/unicharset \ --script_dir /home/p/Documents/T/TESS_4_ALPHA/langdata-master

Re: [tesseract-ocr] train tesseract OCR 4.0

2017-04-04 Thread srnsp92
Hello ShreeDevi, https://medium.com/apegroup-texts/training-tesseract-for-labels-receipts-and-such-690f452e8f79 In the link, we can see a full fledged tutorial of tesseract 3.0 version, of using it and training it. Can you please clarify the below points...?