Hi,I have given a skewed image as input to Tesseract-ocr.
The plain text output generated de-skews the image and is giving the
correct output.
But in hocr output, the bounding box coordinates of the words are with
respect to the original skewed image.
So,Is there any way to get the coordinates
Please open as issue, as problem related to --psm 0.
- excuse the brevity, sent from mobile
On 13-Apr-2017 9:29 AM, "Pritam Dodeja" wrote:
> Find below - I can also ship my docker container to you if you want so you
> can see my exact setup, it's about 1.15GB
>
>
Find below - I can also ship my docker container to you if you want so you
can see my exact setup, it's about 1.15GB
Pritam
On Wednesday, April 12, 2017 at 10:09:35 PM UTC-4, shree wrote:
>
> Which operating system - Ubuntu 16.10 Yakkety Yak on x86_64
> Which version/commit of tesseract - top
The command below also produces the same result ( segmentation fault )
tesseract a.jpg stdout --oem 1 --psm 0 -l eng
Pritam
On Wednesday, April 12, 2017 at 10:56:09 AM UTC-4, shree wrote:
>
> See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
>
> Follow correct order of
See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
Follow correct order of variables
tesseract imagename|stdin outputbase|stdout [options...] [configfile...]
ShreeDevi
भजन - कीर्तन - आरती @
The command was the following:
tesseract -l eng --oem 1 --psm 0 a.jpg stdout
As far as where it occurred exactly, I can't tell. I have been able to
reproduce this with multiple jpgs - let me know if you need any further info
tesseract --version shows
tesseract 4.00.00alpha
leptonica-1.74.1
Lstm training is not like legacy training. Please read the wiki pages
regarding 4.0 training. I have given all sample commands there. There are 3
different ways of training.
Read the bash scripts regarding training to know more.
tesstrain.sh with --linedata-only creates the box tiff pairs but
Sorry, I have given wrong commands for arabic. Actually i was referring to
english.
tesseract eng.arial.exp4.tif eng.arial.exp4 nobatch box.train
unicharset_extractor eng.arial.exp4.box
echo "arial 0 0 1 0 0" > font_properties # tell Tesseract informations
about the font
mftraining -F
Arabic was never trained with the legacy tesseract engine and I doubt you
will get any improvement over existing traineddata using cube or lstm.
You are free to experiment and see what you come up with.
I have pointed to the bash scripts for training. Please refer to them for
the correct
Hello shree, Thank you for your valuable reply.. Are there any changes i
need to follow for the steps below.. I request you to suggest the changes
for the below commands, these are for tess 3.0
tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
unicharset_extractor ara.arial.exp4.box
You can use jtessboxeditor to edit the box files. Make sure to mark EOL if
you are trying to train using scanned images.
Also note that this part of code is untested - training 4.0 using
pre-existing images and box files.
Ray has only explained method for using images created by text2image.
I am able to train the tesseract with fine tuning technique with some
training text (not images).. and i want to know how train tesseract with
images and box files.. I am getting confused because when i give this
tesseract ara.arial.exp4.tif ara.arial.exp4 nobatch box.train
command, tr files
see
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh
if ((LINEDATA)); then
phase_E_extract_features "lstm.train" 8 "lstmf"
make__lstmdata
else
phase_E_extract_features "box.train" 8 "tr"
phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
if
Can u tell when did you got his, means with the usage of which command did
ypou get this error and at at which step..?
On Wednesday, April 12, 2017 at 12:16:54 PM UTC+5:30, Pritam Dodeja wrote:
>
> Hi,
>
> I get segmentation faults when using page segmentation mode 0. Has anyone
> else
Can you please tell, whether the command -> tesseract ara.arial.exp4.tif
ara.arial.exp4 nobatch box.train
is right or not for tesseract 4. As it is producing .tr files when i give
this command in tesseract 4. for image files training
On Wednesday, April 12, 2017 at 2:19:24 PM UTC+5:30, shree
Can you please tell me how to split box and and merge two boxes
respectively. I am not able to find any options regarding this. If you
specify, it will be helpful to me and others also.
Thank You.
On Tuesday, April 11, 2017 at 9:10:14 AM UTC+5:30, Quan Nguyen wrote:
>
> For Case 1, you'll need
Thanks Shree for your reply I appreciate it, My intention: is that right
path for training Tesseract 4.0 LSTM or not?
On Wednesday, April 12, 2017 at 10:49:24 AM UTC+2, shree wrote:
>
> Read the bash scripts in
>
> tesstrain.sh
> tesstrain_utils.sh
> language_specific.sh
>
> In training
Read the bash scripts in
tesstrain.sh
tesstrain_utils.sh
language_specific.sh
In training directory
To understand more detail about lstm training
- excuse the brevity, sent from mobile
On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote:
> this is the part from
--linedata-only means that it will only try to create lstmf files and not
the files for 3.0x traing
- excuse the brevity, sent from mobile
On 12-Apr-2017 10:39 AM, "Ahmad Moawad" wrote:
> Hello All,
>
> I want help in trainingTesseract 4.00 Finetune
>
Hi,
I get segmentation faults when using page segmentation mode 0. Has anyone
else experienced this?
Pritam
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Hi,
I am new to Teseract-OCR , and want to use it for one of my windows
application using c#.
I tried few things but unable to get desire results I need.
So can anybody help me with simple example , how to use this OCR to convert
scanned pdf file to searchable text.
Thanks and Advance.
--
21 matches
Mail list logo