Hi Ayush,
usually images are denoised much more. I think the standard models are
trained on pure black on pure white background, maybe with a little noise.
I think it could work even on these images especially with fine tuning. But
this is not the typical training data, I'm not surprised you have
Hi Lorenzo, Shree
- Here is the link of the images for which no lsmtf files were generated
->
https://drive.google.com/drive/folders/1VDBPB_k-oOXbWUI3zIlB3ljuyIlOkoMK?usp=sharing
.
- Here is the Makefile that I used for generating lstmf files ->
Hi Ayush,
psm 6 and 7 do some extra pre-processing of the image, 13 does much less.
Unless your image contains text like this:
I would not expect much difference between PSM 6/7 and 13. While PSM 13
solves some problems I got more "ghost letters" errors (letters that are
repeated
Hi Lorenzo. The empty output was due to the fact that I was using 7 as PSM
parameter. Using 13 as PSM parameter completely eliminated the problem.
On Friday, September 6, 2019 at 12:34:22 PM UTC+5:30, Lorenzo Blz wrote:
>
> Can you please share an example?
>
> An empty output usually means that
Can you please share an example?
An empty output usually means that it failed to recognize the black parts
as text, this could be because the text is too big or too small or a wrong
dpi setting. Or the image is not reasonably clean.
To better understand the problem you can try to downscale the
Hi shree,
Thank you so much for your response. I also wanted to ask, I
do get an empty output on a lot of images, after training, the height and
width of the image in pixels is usually > 100. Apart from changing the psm
value, is there any other way to reduce this.
On Thursday,
See
https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR#tesseract-fails-to-create-lstm-files
On Thu, Sep 5, 2019 at 1:25 PM Ayush Pandey wrote:
> Tesseract Version: 4.1.0
>
> I am trying to fine tune tesseract on custom dataset with the following
> Makefile:
>
> export
>
> SHELL :=
Tesseract Version: 4.1.0
I am trying to fine tune tesseract on custom dataset with the following
Makefile:
export
SHELL := /bin/bash
HOME := $(PWD)
TESSDATA = $(HOME)/tessdata
LANGDATA = $(HOME)/langdata
# Train directory
# TRAIN := $(HOME)/train_data
TRAIN :=
Thank you for your further explanation, Shree!!
On Friday, May 3, 2019 at 2:59:12 AM UTC-7, shree wrote:
>
> >There are three model sizes: best, normal and fast. Each of these can
> also be converted to an integer model.
>
> Only `best` can be converted to integer and in fact the LSTM models in
Hi, Lorenzo,
Thank you very much for your reply. It really gives more clue about the
training.
All the best,
Tairen
On Friday, May 3, 2019 at 2:30:12 AM UTC-7, Lorenzo Blz wrote:
>
> See answer inline.
>
> Il giorno ven 3 mag 2019 alle ore 03:48 Tairen Chen > ha
Shree, thanks for the clarification.
Il giorno ven 3 mag 2019 alle ore 11:59 Shree Devi Kumar <
shreesh...@gmail.com> ha scritto:
> >There are three model sizes: best, normal and fast. Each of these can
> also be converted to an integer model.
>
> Only `best` can be converted to integer and in
>There are three model sizes: best, normal and fast. Each of these can also
be converted to an integer model.
Only `best` can be converted to integer and in fact the LSTM models in
`tessdata` are the integer versions of best along with the base/legacy
models.
`fast` models have been trained with
See answer inline.
Il giorno ven 3 mag 2019 alle ore 03:48 Tairen Chen
ha scritto:
>
> 1. I define the "--max_iterations 2" but the training stops at
> 5700, like below:
> " At iteration 351/5700/5700, Mean rms=0.117%, delta=0%, char
> train=0%, word train=0%, skip ratio=0%,
Thank you very much for your quick answer, Lorenzo!
You are right, it is an extra space at the beginning where the
"TESSDATA" is defined not at the "lstmtraining" line.
I still have few questions want to ask you for help.
1. I define the "--max_iterations 2" but the
Hi Tairen,
the error is quite clear:
Must provide a --traineddata see training wiki
You say that it works if you run it as a single line so I suppose there is
something wrong in the make file, probably a typo. Maybe there is a space
or a tab after a "\" ?
Maybe there are some extra characters
Hi, Lorenzo and Shree
Thanks for your sharing.
I am trying to repeat what you have done here.
I followed your posts and change the Makefile, but when I run $ make
training,
I got the following errors:
mkdir -p data/checkpoints
lstmtraining \
Having a hard time training tesseract as I am naive to this. Is it possible
to get the updated code for fine-tuning now that langdata is not
supported? https://github.com/OCR-D/ocrd-train/issues/49
On Friday, 29 June 2018 08:09:09 UTC-4, shree wrote:
>
> I modified the makefile for ocrd-train
Thank you so much.. That worked. :)
On Tuesday, September 18, 2018 at 9:24:53 PM UTC+5:30, shree wrote:
>
> If you are getting error
>
> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
> !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
>
> You are probably
If you are getting error
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
!int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244
You are probably using the traineddata fille which has an `integer` model.
Please use tessdata_best as base for further training.
On Tue,
Hi Shree,
I replaced the line:
merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset
$(TRAIN)/my.unicharset "$@"
with:
cp "$(TRAIN)/my.unicharset" "data/unicharset"
(I write this in case someone else is following this thread).
And now I have a fine tuned brand new model with only
>
The problem was a "-gt.txt" rather than a ".gt.txt" as in my train files.
Now I can run your script directly.
Oh, I remember now. I had changed that for ease in renaming files for some
reason.
> In this way can I train a model that, for example, only recognize
uppercase characters, or
I think I found the problem. Running directly the new Makefile I had this
error:
make: *** No rule to make target
'data/train/alexis_ruhe01_1852_0018_022.box', needed by 'data/all-boxes'.
Stop.
The problem was a "-gt.txt" rather than a ".gt.txt" as in my train files.
Now I can run your script
You should be able to use the new makefile after you make changes for all
the directory locations to match your setup.
Change the language from frk to eng, though the sample training text seems
to be non-english. In which case it is better for you to use the
appropriate language traineddata eg.
Hi Shree, thanks for your answer.
I tried the script setting:
TESSDATA=extracted # here I have the eng.lstm and
eng.trainedata
LANGDATA=langdata-master # all langdata downladed by OCR-D
MODEL_NAME = eng
CONTINUE_FROM = eng
First I run the old Makefile to create the boxes.
I modified the makefile for ocrd-train to do fine-tuning. It is pasted
below:
export
SHELL := /bin/bash
LOCAL := $(PWD)/usr
PATH := $(LOCAL)/bin:$(PATH)
HOME := /home/ubuntu
TESSDATA = $(HOME)/tessdata_best
LANGDATA = $(HOME)/langdata
# Name of the model to be built
MODEL_NAME = frk
# Name
Hi,
I'm trying to do fine tuning of an existing model using line images and
text labels. I'm running this version:
tesseract 4.0.0-beta.3-56-g5fda
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff
4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2
26 matches
Mail list logo