[tesseract-ocr] coordinates of a skewed image

2017-04-12 Thread Varun Ejanthkar
Hi,I have given a skewed image as input to Tesseract-ocr. The plain text output generated de-skews the image and is giving the correct output. But in hocr output, the bounding box coordinates of the words are with respect to the original skewed image. So,Is there any way to get the coordinates

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar
Please open as issue, as problem related to --psm 0. - excuse the brevity, sent from mobile On 13-Apr-2017 9:29 AM, "Pritam Dodeja" wrote: > Find below - I can also ship my docker container to you if you want so you > can see my exact setup, it's about 1.15GB > >

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread Pritam Dodeja
Find below - I can also ship my docker container to you if you want so you can see my exact setup, it's about 1.15GB Pritam On Wednesday, April 12, 2017 at 10:09:35 PM UTC-4, shree wrote: > > Which operating system - Ubuntu 16.10 Yakkety Yak on x86_64 > Which version/commit of tesseract - top

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread Pritam Dodeja
The command below also produces the same result ( segmentation fault ) tesseract a.jpg stdout --oem 1 --psm 0 -l eng Pritam On Wednesday, April 12, 2017 at 10:56:09 AM UTC-4, shree wrote: > > See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage > > Follow correct order of

Re: [tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage Follow correct order of variables tesseract imagename|stdin outputbase|stdout [options...] [configfile...] ShreeDevi भजन - कीर्तन - आरती @

[tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread Pritam Dodeja
The command was the following: tesseract -l eng --oem 1 --psm 0 a.jpg stdout As far as where it occurred exactly, I can't tell. I have been able to reproduce this with multiple jpgs - let me know if you need any further info tesseract --version shows tesseract 4.00.00alpha leptonica-1.74.1

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Lstm training is not like legacy training. Please read the wiki pages regarding 4.0 training. I have given all sample commands there. There are 3 different ways of training. Read the bash scripts regarding training to know more. tesstrain.sh with --linedata-only creates the box tiff pairs but

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Sorry, I have given wrong commands for arabic. Actually i was referring to english. tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train unicharset_extractor eng.arial.exp4.box echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations about the font mftraining -F

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Arabic was never trained with the legacy tesseract engine and I doubt you will get any improvement over existing traineddata using cube or lstm. You are free to experiment and see what you come up with. I have pointed to the bash scripts for training. Please refer to them for the correct

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Hello shree, Thank you for your valuable reply.. Are there any changes i need to follow for the steps below.. I request you to suggest the changes for the below commands, these are for tess 3.0 tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train unicharset_extractor ara.arial.exp4.box

Re: [tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-12 Thread ShreeDevi Kumar
You can use jtessboxeditor to edit the box files. Make sure to mark EOL if you are trying to train using scanned images. Also note that this part of code is untested - training 4.0 using pre-existing images and box files. Ray has only explained method for using images created by text2image.

Re: [tesseract-ocr] Re: train tesseract OCR 4.0

2017-04-12 Thread srnsp92
I am able to train the tesseract with fine tuning technique with some training text (not images).. and i want to know how train tesseract with images and box files.. I am getting confused because when i give this tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train command, tr files

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh if ((LINEDATA)); then phase_E_extract_features "lstm.train" 8 "lstmf" make__lstmdata else phase_E_extract_features "box.train" 8 "tr" phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto" if

[tesseract-ocr] Re: segmentation fault with tesseract 4

2017-04-12 Thread srnsp92
Can u tell when did you got his, means with the usage of which command did ypou get this error and at at which step..? On Wednesday, April 12, 2017 at 12:16:54 PM UTC+5:30, Pritam Dodeja wrote: > > Hi, > > I get segmentation faults when using page segmentation mode 0. Has anyone > else

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread srnsp92
Can you please tell, whether the command -> tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train is right or not for tesseract 4. As it is producing .tr files when i give this command in tesseract 4. for image files training On Wednesday, April 12, 2017 at 2:19:24 PM UTC+5:30, shree

[tesseract-ocr] Re: Tesseract (4 alpha ) Amibiguos Situation while Correcting Chars in box file

2017-04-12 Thread srnsp92
Can you please tell me how to split box and and merge two boxes respectively. I am not able to find any options regarding this. If you specify, it will be helpful to me and others also. Thank You. On Tuesday, April 11, 2017 at 9:10:14 AM UTC+5:30, Quan Nguyen wrote: > > For Case 1, you'll need

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread Ahmad Moawad
Thanks Shree for your reply I appreciate it, My intention: is that right path for training Tesseract 4.0 LSTM or not? On Wednesday, April 12, 2017 at 10:49:24 AM UTC+2, shree wrote: > > Read the bash scripts in > > tesstrain.sh > tesstrain_utils.sh > language_specific.sh > > In training

Re: [tesseract-ocr] Train Tesseract 4.0 LSTM based on images

2017-04-12 Thread ShreeDevi Kumar
Read the bash scripts in tesstrain.sh tesstrain_utils.sh language_specific.sh In training directory To understand more detail about lstm training - excuse the brevity, sent from mobile On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote: > this is the part from

Re: [tesseract-ocr] Help in TrainingTesseract 4.00 Finetune

2017-04-12 Thread ShreeDevi Kumar
--linedata-only means that it will only try to create lstmf files and not the files for 3.0x traing - excuse the brevity, sent from mobile On 12-Apr-2017 10:39 AM, "Ahmad Moawad" wrote: > Hello All, > > I want help in trainingTesseract 4.00 Finetune >

[tesseract-ocr] segmentation fault with tesseract 4

2017-04-12 Thread Pritam Dodeja
Hi, I get segmentation faults when using page segmentation mode 0. Has anyone else experienced this? Pritam -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[tesseract-ocr] Need to use Tesseract -OCR with C# Windows application-First time user need example

2017-04-12 Thread Pavan Mahajan
Hi, I am new to Teseract-OCR , and want to use it for one of my windows application using c#. I tried few things but unable to get desire results I need. So can anybody help me with simple example , how to use this OCR to convert scanned pdf file to searchable text. Thanks and Advance. --