>Where are these scripts, or how can I otherwise generate training text
from dictionary/corpus data?
These are (most probably) internal scripts at Google which have not been
open sourced.
Please see
Traineddata size will depend on many things, not just number of images.
If your unicharset and number of fonts hasn't changed, then the size maybe
similar.
Traineddata file also has the wordlists in it, so if you are using a
smaller wordlist compared to the one in original eng.traineddata, size
check that the file is there
ls -l */home/ibr/tesstutorial/impact_from_full/jpn.lstm*
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Jun 14, 2017 at 7:28 PM, Ibr wrote:
> yes I already
> what is the difference between the engtrain and engeval?
It will depend on what fonts and training text you use for each.
one is used for training, the other is for evaluation of the training.
ShreeDevi
भजन - कीर्तन - आरती @
You need to extract .lstm from traineddata
eg. (change foldernames to match ur setup)
combine_tessdata -e ../tessdata/jpn.traineddata jpn.lstm
Extracting tessdata components from ../tessdata/jpn.traineddata
Wrote jpn.lstm
0:config:size=2573, offset=168
1:unicharset:size=280627, offset=2741
combine_tessdata -e
extracts the lstm file from the traineddata provided from original training
by google.
-
tesstrain.sh it will create .lstmf files
yes. these are created from the box-tiff pairs created from the training
text and fonts
---
you have to be clear on what files you are combining.
the command you have given is overwriting japanese traineddata - is that
what you want to do?
> *training/combine_tessdata -o tessdata/jpn.traineddata*
*Look at help for all options of combine_tessdata*
*Figure out which files (lstm, dawg
*tesseract image results -l ara --tessdata-dir ./tessdata --oem 1*
*uses the LSTM files that are there in ara.traineddata in your tessdata
directory.*
*Just placing lstm files in tesseract folder is not going to change
anything.*
*You need to create a new traineddata with the new lstm files and
Hari,
Please also look in the leptonica program directory
for
pdf2tiff
pdf2mtiff
etc
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
t;>
>Pix pix = b.Convert(bitmap);
>
> This is not leptonica code. It shouldn't compile, with b being a ptr
> that is dereferenced with a ".". This is then set equal to a pix which is
> (as written) not a ptr either, causing a copy if it were correct.
>
>
> On Mon, Jun 12, 201
see https://github.com/tesseract-ocr/tesseract/issues/928
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jun 12, 2017 at 3:58 PM, Ibr wrote:
> Hi,
>
> When I want to detect an image on
image processing within tesseract is done by leptonica.
https://github.com/DanBloomberg/leptonica
+ dan bloomberg
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jun 12, 2017 at 11:25 AM, Hari.K
Technical documentation links
https://github.com/tesseract-ocr/tesseract/wiki/Technical-Documentation
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
+ quan
Quan will be better able to advice regarding .net
also see https://sourceforge.net/projects/vietocr/files/
vietocr.net/5.0alpha/
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Jun 9, 2017 at 10:44 AM,
Have you tried using ghostscript to convert pdf to tif files instead?
Example commands
gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=106 -dLastPage=109-o
./tulasi/tulasikrishna%00d.tif "TulasiPuja.pdf"
for one tif per page
gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=126 -dLastPage=131
As far as I know, the traineddata files for 3.04 (also usable for 3.05) are
github versions of the files posted on code.google.com for 3.02. So, I
would think 3.02 traineddata files will work with 3.05 but newer files will
not work with 3.02.
Best is to give it a try and report your results.
try latest code from
http://www.emgu.com/wiki/index.php/Version_History#Emgu.CV-3.2.0
I converted the bmp to png and tried with command line tesseract 4 and get
correct result.
$ tesseract I.png stdout --oem 1 --psm 6
D
$ tesseract I.png stdout --oem 0 --psm 6
D
original .bmp also works.
$
Yes, it should be there in tessdata like eng.user-words
Please open an issue withdetails and link to this thread also, so that it
can be added.
Thanks!
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jun 5, 2017
File is there in langdata
https://github.com/tesseract-ocr/langdata/blob/master/ita/ita.special-words
and is referred to in the language config file
https://github.com/tesseract-ocr/langdata/blob/master/ita/ita.config
ShreeDevi
भजन
tes a combined version to use for
recognition
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jun 5, 2017 at 7:05 PM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:
> Comments from Ray regarding t
text2image --list_available_fonts --fonts_dir /mnt/c/Windows/Fonts
replace the fonts directory with your fonts location
eg.
633: Times New Roman,
634: Times New Roman, Bold
635: Times New Roman, Bold Italic
636: Times New Roman, Italic
637: Trajan Pro
638: Trajan Pro Bold
639: Trebuchet MS
640:
Read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
Follow the tutorials.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Are you training for 3.0 or 4.0?
Do you have spaces between the letters in your training text?
Read https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
https://github.com/tesseract-ocr/tessdata
has the traineddata for 4.0.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
gt;>>>>
>>>>> binaries from https://github.com/UB-Man
>>>>> nheim/tesseract/wiki
>>>>>
>>>>> Use for GUI - look for tesseract 4.0 versions
>>>>>
>>>>> gImages
gt;>> VietOCR https://sourceforge.ne
>>> t/projects/vietocr/files/vietocr/5.0alpha/
>>>
>>>
>>>
>>> ShreeDevi
>>>
>>> भजन - कीर्तन - आरती @ http://bhaj
Does configure need any change?? See earlier messages for details.
>> i can't manage to get an option for ./configure to use g++ instead of
gcc. If somebody knows how, i would be grateful.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To
Supported Compilers
- GCC 4.8 and above
- Clang 3.4 and above
- MSVC 2015, 2017
Other compilers might work, but are not officially supported.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 31, 2017
*git pull origin*
to get the latest source. I have built it today without any problems.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 31, 2017 at 6:32 PM, Youcef wrote:
> Hi,
>
>
/manisandro/
gImageReader/releases
VietOCR
https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 31, 2017 at 5:05 PM, ShreeDevi
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
https://github.com/tesseract-ocr/tesseract/wiki
https://github.com/UB-Mannheim/tesseract/wiki
https://github.com/manisandro/gImageReader/releases
ShreeDevi
भजन -
The output you posted, is it using the 3.04 traineddata from repo?
What PSM did you use?
Try using the experimental tesseract4 version for windows , see wiki for
links.
On May 31, 2017 3:47 PM, "Mandeep Singh" wrote:
> I am using Window 8.1 and tesseract version 3.04.
>
Samuel,
Do the user-words work as expected after making this change?
Which version of tesseract are you using?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 31, 2017 at 2:35 AM, Samuel backus
Try the `hocr` output and see if it provides some of what you need.
I don't think tesseract will link to footnotes though it may recognize the
text.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 30, 2017 at
Ray is the best person to answer your questions. I can only share my
experience trying to train using Devanagari script.
Fine Tune will work if all you want to change is a font, with the same
unicharset. This works well for Latin script based languages but not
complex scripts.
eg. for
Please see inline replies:
On Sun, May 28, 2017 at 4:53 PM, Akira Hayakawa wrote:
> I am new to tesseract. My aim is to use this software to analyze Japanese
> doc. The idea in my mind is to start from existing model and fine-tune it
> by new words that weren't correctly
>> be found in https://github.com/tesseract-ocr/tessdata/tree/3.04.00
>>
>> Zdenko
>>
>> On Wed, May 24, 2017 at 2:54 PM, ShreeDevi Kumar <shree...@gmail.com>
>> wrote:
>>
>>> cube traini
tesseract writes the file names to console, you can try the following:
tesseract list.txt stdout > output.txt 2>&1
or
tesseract list.txt stdout -c include_page_breaks=1 > output.txt 2>&1
ShreeDevi
भजन - कीर्तन - आरती @
cube training is not supported, no information is available for it. It has
been deleted from the latest code.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 24, 2017 at 2:51 PM, Merlin ArulPrakash <
Which O/S?
Which version of Tesseract?
How are you training?
Have you tried the packaged traineddata for Punjabi? What result do you get
with that?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, May 24, 2017 at
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, May 22, 2017 at 8:31 PM,
Look at the examples in
https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/2ArchitectureAndDataStructures.pdf
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, May 22, 2017 at 7:34 PM, Saliaj Adrian
also see
https://github.com/tesseract-ocr/tesseract/blob/master/contrib/genlangdata.pl
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, May 20, 2017 at 10:12 AM, ShreeDevi Kumar <shreesh...@gmail.com>
Google has not shared its method of training with complete scripts etc. The
training instructions on wiki are only a tutorial for learning about LSTM
training.
Please also see https://github.com/tesseract-ocr/tesseract/issues/644
ShreeDevi
--
You received this message because you are
As per Ray 4500 fonts and 40 lines of text were used to create the
models of latin scriipt based languages. So I am not sure whether you can
replicate the model.
For language specific exposure settings etc see
1. Which --oem are you using with tesseract 4, legacy engine or lstm?
--oem 0 or --oem 1
2. Is Brazilian Portuguese very different from Portuguese? Please see the
trainingtext and wordlists on
https://github.com/tesseract-ocr/langdata/tree/master/por
3. Provide a sample image with it's ground
Which version of tesseract, which source?
Tesseract 4, master branch does not support visual studio 2010, please
check the changelog.
You can try the 3.05 branch or newer visual studio.
On May 15, 2017 8:10 PM, "emna ouerteni" wrote:
> include tesseract ocr in
Please see
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
80 is the default. I think it means both 64 and 16 are applied.
train_mode int 80 Flags from TrainingFlags in lstmrecognizer.h Possible
values= 64 for Compress unicharset, 16 for round-robin training.
ShreeDevi
.
Please note that so far I have not had success in improving the accuracy of
hindi traineddata with my experiments.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
2017-05-10 22:07 GMT+05:30 ShreeDevi Kumar <shre
shree wrote:
>>
>> Attached is the output I get with
>>
>> tesseract nep_text_11.png nep_text_11 --oem 1 --psm 6 -l hin
>>
>>
>> ShreeDevi
>>
>> भजन - कीर्तन - आरती @ http://bhajans.ra
try option for multiple languages
-l eng+
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017 at 9:47 PM, wrote:
> Hi Community,
>
> Can someone please tell me how to
Attached is the output I get with
tesseract nep_text_11.png nep_text_11 --oem 1 --psm 6 -l hin
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
2017-05-09 21:11 GMT+05:30 ShreeDevi Kumar <shreesh...@gmail.com>:
&g
Thanks. Please provide the 'ground truth' ie the original accurate text for
the image.
Have tried to OCR the same image with options
--oem 1 --PSM 6 -l hin
Sometimes hindi traineddata gives better results.
On May 9, 2017 9:05 PM, "Nirajan Pant" wrote:
> Here is a sample
https://github.com/tesseract-ocr/tesseract/wiki/Compiling
master branch on github is for 4.0.0alpha
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017 at 7:35 PM, sfo wrote:
>
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017 at 7:29 PM, sfo wrote:
> hello! where can i find tesseract 4.0
Box files are generated after the tif. The script works on 8 fonts at a
time.
ls -l /tmp/tmp.Vu25eURnxk/eng/*.*
will show you all generated files.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017
see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
for info about training.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017 at 12:38 PM, ShreeDevi Kumar <shreesh...@gmail.
Please provide sample of 'not giving good results' and samples of lines not
being recognized correctly. Images and ground truth files will be helpful.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 9, 2017 at
Most probably the API example has not been updated for tesseract 4.
There have been many changes -
Please see https://abi-laboratory.pro/tracker/timeline/tesseract/
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun,
When using pre-existing box tiff pairs, you have to add a box with tab
character to mark end of line and also add boxes with spaces after every
word.
You then need to generate the .lstmf files - please
see training/tesstrain.sh for details.
ShreeDevi
Please provide your original image for testing. Thanks!
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, May 4, 2017 at 5:36 PM, 'Thomas Zipproth' via tesseract-ocr <
tesseract-ocr@googlegroups.com> wrote:
> We
Ibr,
You are incorrect in your description of LSTM training.
What you are doing will use the ara.traineddata provided in the repo, there
will be no change in output.
Once lstmf files are created, you have to run lstmtraining which will run
for days/weeks to give you a good result.
Please read
tesseract is not meant for OCR of handwriting.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, May 2, 2017 at 1:02 PM, Jaya Kumar wrote:
> Hi ,
>
> I have a image document and I am trying
Stefan,
Please make the mac binaries available for both 3.05 and 4.00 similar to
windows.
I noticed that you have posted the test version for standalone Tess.
Thanks!
PS: Are the Travis created binaries available for download by users?
On May 1, 2017 7:30 PM, "'Stefan Weil' via tesseract-ocr" <
See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
- excuse the brevity, sent from mobile
On 27-Apr-2017 9:04 PM, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:
> tesseract output is plain text only, you will not get rich text with fonts
&
tesseract output is plain text only, you will not get rich text with fonts
etc.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Apr 27, 2017 at 7:25 PM, Jaya Kumar wrote:
> Hi
> I am
I built both from source yesterday.
Try the following for building tesseract
/autogen.sh
./configure
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make
sudo make install
sudo ldconfig
As given in compiling page on wiki
- excuse the brevity, sent from mobile
On 25-Apr-2017 2:14 PM,
See
https://github.com/tesseract-ocr/tesseract/wiki/User-App-Example
https://github.com/tesseract-ocr/tesseract/wiki/APIExample
- excuse the brevity, sent from mobile
On 25-Apr-2017 12:11 PM, "Dhairya Shah" wrote:
> Dear All,
> I am absolute complete beginner with
362b68e)
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Apr 23, 2017 at 9:25 AM, ShreeDevi Kumar <shreesh...@gmail.com>
wrote:
> Try training using more samples of 8, 9, B etc.
>
> What res
Try training using more samples of 8, 9, B etc.
What results do you get with the provided eng.traineddata? Are they better
or worse?
Have you tried changing DPI of image to 300?
- excuse the brevity, sent from mobile
On 22-Apr-2017 10:29 PM, "James Abney" wrote:
> Oh yes
Which version of Tesseract. Which o/s?
If all your text is in tungsten-semibold, have you tried training with just
that font?
- excuse the brevity, sent from mobile
On 22-Apr-2017 12:50 AM, "James Abney" wrote:
The font is tungsten semibold
On Friday, April 21, 2017 at
If you want to OCR an invoice like the sample you posted, just use the
eng.traineddata and OCR the page. You do not need to do any training.
Here is the output I get
8633 0410 NO RP 11 07122015 NYNN 01 01 0001 Page 2 Of 3
Did you know?
Your Comcast Business Internet
service gives
You can check that these are installed by entering the following
which text2image
The above will show u the location it is installed
If you don't have training tools, you will need to build them separately -
see https://github.com/tesseract-ocr/tesseract/wiki/Compiling
make training
sudo make
I haven't built 3.05 so cannot help. I would suggest that you try with
older commits of tesseract 3.05 branch to see which one works.
Hope that those who have built 3.05 on mac will help.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To
Please see https://github.com/tesseract-ocr/tesseract/wiki/Compiling
If you are building tesseract 4.0, you need Lept 1.74
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Apr 18, 2017 at 2:25 PM, Peter Reid
Use latest version of leptonica - 1.74.1
https://github.com/DanBloomberg/leptonica
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Apr 17, 2017 at 8:18 PM, Peter Reid wrote:
> I've done
Please open as issue, as problem related to --psm 0.
- excuse the brevity, sent from mobile
On 13-Apr-2017 9:29 AM, "Pritam Dodeja" wrote:
> Find below - I can also ship my docker container to you if you want so you
> can see my exact setup, it's about 1.15GB
>
>
See https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
Follow correct order of variables
tesseract imagename|stdin outputbase|stdout [options...] [configfile...]
ShreeDevi
भजन - कीर्तन - आरती @
Lstm training is not like legacy training. Please read the wiki pages
regarding 4.0 training. I have given all sample commands there. There are 3
different ways of training.
Read the bash scripts regarding training to know more.
tesstrain.sh with --linedata-only creates the box tiff pairs but
Arabic was never trained with the legacy tesseract engine and I doubt you
will get any improvement over existing traineddata using cube or lstm.
You are free to experiment and see what you come up with.
I have pointed to the bash scripts for training. Please refer to them for
the correct
You can use jtessboxeditor to edit the box files. Make sure to mark EOL if
you are trying to train using scanned images.
Also note that this part of code is untested - training 4.0 using
pre-existing images and box files.
Ray has only explained method for using images created by text2image.
see
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh
if ((LINEDATA)); then
phase_E_extract_features "lstm.train" 8 "lstmf"
make__lstmdata
else
phase_E_extract_features "box.train" 8 "tr"
phase_C_cluster_prototypes "${TRAINING_DIR}/${LANG_CODE}.normproto"
if
Read the bash scripts in
tesstrain.sh
tesstrain_utils.sh
language_specific.sh
In training directory
To understand more detail about lstm training
- excuse the brevity, sent from mobile
On 12-Apr-2017 10:47 AM, "Ahmad Moawad" wrote:
> this is the part from
--linedata-only means that it will only try to create lstmf files and not
the files for 3.0x traing
- excuse the brevity, sent from mobile
On 12-Apr-2017 10:39 AM, "Ahmad Moawad" wrote:
> Hello All,
>
> I want help in trainingTesseract 4.00 Finetune
>
Also, if you want training tools, you need to build them separately - see
https://github.com/tesseract-ocr/tesseract/wiki/Compiling
make training
sudo make training-install
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
You can ignore it. I get it too when using sudo 2nd time.
Host name must be the id for your computer under windows10.
Have u tried running tesseract after that?
- excuse the brevity, sent from mobile
On 11-Apr-2017 4:10 PM, "Ibr" wrote:
Hi,
I'm trying to install the
I have added this at https://github.com/tesseract-ocr/langdata/issues/67
Please add more information there:
Which language code - arm or hye
Modern Armenian or Classical Armenian
Sources for primary texts in unicode the Armenian language to use for
training
Freely available unicode fonts to
Arabic traineddata for 3.0x uses cube engine. Training process for that was
never shared. Now the cube engine has been removed for lstm 4.0, which is
still in alpha stage.
There is 4.0alpha traineddata for Arabic and you can train for it , but
accuracy is not great. Ray is doing another training
You must be using an old version of traineddata which does not have LSTM.
- excuse the brevity, sent from mobile
On 07-Apr-2017 2:13 AM, wrote:
> I am following this link https://github.com/tesseract-ocr/tesseract/wiki/
> TrainingTesseract-4.00---Finetune
>
> For genaerating
Normally, for text output, the other config files should not impact.
- excuse the brevity, sent from mobile
On 07-Apr-2017 2:18 AM, "Mike Hall" wrote:
> Yes, we are using the -psm 6 command line argument. And it was not
> working.
>
> But I figured out the issue.
>
>
Have u tried --psm 6
- excuse the brevity, sent from mobile
On 06-Apr-2017 11:06 PM, "Mike Hall" wrote:
> We have a C# .Net app that is using Tesseract to do Optical Character
> Recognition (OCR) on .tiff files. I've attached a sample tiff file.
>
> We are then
You do not have the LSTM.train config file.
- excuse the brevity, sent from mobile
On 05-Apr-2017 1:55 PM, wrote:
> After u have said,
>
> I tried in two ways and i am stuck at lstm step:
>
> Training
>
> command used:
>
>
4.0 is alpha software. Please use an older released version.
- excuse the brevity, sent from mobile
On 05-Apr-2017 1:55 PM, wrote:
> After u have said,
>
> I tried in two ways and i am stuck at lstm step:
>
> Training
>
> command used:
>
>
Have you tried just using the eng.traineddata directly with tess 3.04/ 3.05
/ 4.0?
You don't need to train unless it is a very special case. You can try
changing the dictionary dawg files with tess 3.0x.
ShreeDevi
भजन - कीर्तन -
Read
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00---Finetune
Tesstrain.sh generates a file called eng.training_files.txt
You are using command without .text extension
Check the name of generated file and use that.
I have found that editing that file also gives errors.
- excuse the brevity, sent from mobile
On 04-Apr-2017 7:01 PM,
See
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh
https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh
https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh
--
You received this message because you are
Saurabh,
It depends on what you want to do with the bash script.
Here is a sample of a script I used to compare results using diff tessdata
files by looping thru a set of image files. Google the bash commands to
figure out what they do!
#!/bin/bash
set -vx
export
jpn.config in langdata/jpn is loading jpn_vert as a sublanguage
tessedit_load_sublangs jpn_vert
You can try without that
Also look at the settings for jpn in training/language_specific.sh
You may need to change the following also ..
# The following fonts will be rendered vertically in phase
You need to get vietocr 5.0 alpha for tesseract 4.0 alpha
https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/
https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/
ShreeDevi
भजन - कीर्तन - आरती @
401 - 500 of 761 matches
Mail list logo