Re: [tesseract-ocr] Problem facing with tessearct training 4 with arabic

2018-04-25 Thread ShreeDevi Kumar
You are trying to train only digits but then using the unicharset which has
these numbers only for compressing the wordlist (which uses Arabic
alphabet)  to a 'dawg'.

The command you have used only creates the starter traineddata for LSTM
training. Please follow the instructions given in the wiki page related to
training tesseract4.

https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Apr 26, 2018 at 4:59 AM, Amir Raouf  wrote:

> First The arabic is read by tesseract with good accuracy but NO DIGITS
> read so I decided to train only numbers with specific font I need
>
> This is the question https://stackoverflow.com/
> questions/50029477/issue-with-training-tesseract-4-0
>
> Any advice
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/8d352529-8cdf-4e83-ba96-691abbd74423%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX1%2B3ZhhTrAsxqXXN%3DgCk91xKfvum7WwQGooncWNnY2Rw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] just installed, get error messages

2018-04-25 Thread Zdenko Podobny
Why are you building project from source if you have no clue what you do?

Based on your other post: you decided to build leptonica without support of
common image formats.

Dňa št 26. 4. 2018, 7:01 Rolf Schumacher 
napísal(a):

> I just installed from git repository
>
> tesseract --version shows:
>
> sc@rolf29 ~ $ tesseract /home/rsc/log/2018-04-26/in.png $LOGDIR
> Error in pixReadMemTiff: function not present
> Error in pixReadMem: tiff: no pix returned
> Error in pixaGenerateFontFromString: pix not made
> Error in bmfCreate: font pixa not made
> Tesseract Open Source OCR Engine v4.0.0-beta.1-192-g3c26 with Leptonica
> Error in pixReadStreamPng: function not present
> Error in pixReadStream: png: no pix returned
> Error in pixRead: pix not read
> Error during processing.
>
> Wenn trying to ocr I get the errors:
>
> rsc@rolf29 ~ $ tesseract /home/rsc/log/2018-04-26/in.png $LOGDIR
> Error in pixReadMemTiff: function not present
> Error in pixReadMem: tiff: no pix returned
> Error in pixaGenerateFontFromString: pix not made
> Error in bmfCreate: font pixa not made
> Tesseract Open Source OCR Engine v4.0.0-beta.1-192-g3c26 with Leptonica
> Error in pixReadStreamPng: function not present
> Error in pixReadStream: png: no pix returned
> Error in pixRead: pix not read
> Error during processing.
>
> Any idea welcome what this is about.
>
> Rolf
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a9d47a88-7467-4dc2-8966-f30d07884673%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xH9OZUr_%3DA9aPSzfUS-sx-yteOQZRWqUyoG0TbHbT%2BfQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] just installed, get error messages

2018-04-25 Thread Rolf Schumacher
I just installed from git repository

tesseract --version shows:

sc@rolf29 ~ $ tesseract /home/rsc/log/2018-04-26/in.png $LOGDIR
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Tesseract Open Source OCR Engine v4.0.0-beta.1-192-g3c26 with Leptonica
Error in pixReadStreamPng: function not present
Error in pixReadStream: png: no pix returned
Error in pixRead: pix not read
Error during processing.

Wenn trying to ocr I get the errors:

rsc@rolf29 ~ $ tesseract /home/rsc/log/2018-04-26/in.png $LOGDIR
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Tesseract Open Source OCR Engine v4.0.0-beta.1-192-g3c26 with Leptonica
Error in pixReadStreamPng: function not present
Error in pixReadStream: png: no pix returned
Error in pixRead: pix not read
Error during processing.

Any idea welcome what this is about.

Rolf

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a9d47a88-7467-4dc2-8966-f30d07884673%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Problem facing with tessearct training 4 with arabic

2018-04-25 Thread Amir Raouf
First The arabic is read by tesseract with good accuracy but NO DIGITS read 
so I decided to train only numbers with specific font I need

This is the question 
https://stackoverflow.com/questions/50029477/issue-with-training-tesseract-4-0

Any advice

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8d352529-8cdf-4e83-ba96-691abbd74423%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Marius Amado-Alves
Zdenko, your latest fix of the makefile has solved this problem:-) Thanks a lot.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAAkGvFBd2%2BnzVqsH9SRLKENbnSbNCP3qLCv0bNh%2BtWb_mmoUWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Zdenko Podobny
We are making reorganization of tesseract.
Using the latest code is not recommended at all especially if you do not
follow developers communications.



Zdenko

2018-04-25 19:59 GMT+02:00 Marius Amado-Alves :

> Trying to install on a Mac, cannot pass the autogen.sh step. Any tips
> highly appreciated. Current directory is /tesseract
>
> bash-3.2# ./autogen.sh
>
> Running aclocal
>
> Running /opt/local/bin/glibtoolize
>
> glibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'config'.
>
> glibtoolize: copying file 'config/ltmain.sh'
>
> glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
>
> glibtoolize: copying file 'm4/libtool.m4'
>
> glibtoolize: copying file 'm4/ltoptions.m4'
>
> glibtoolize: copying file 'm4/ltsugar.m4'
>
> glibtoolize: copying file 'm4/ltversion.m4'
>
> glibtoolize: copying file 'm4/lt~obsolete.m4'
>
> Running autoheader
>
> Running automake --add-missing --copy
>
> configure.ac:314: installing 'config/compile'
>
> configure.ac:23: installing 'config/missing'
>
> Makefile.am:21: error: required directory ./arch does not exist
>
> Makefile.am:21: error: required directory ./ccutil does not exist
>
> Makefile.am:21: error: required directory ./viewer does not exist
>
> Makefile.am:21: error: required directory ./cutil does not exist
>
> Makefile.am:21: error: required directory ./opencl does not exist
>
> Makefile.am:21: error: required directory ./ccstruct does not exist
>
> Makefile.am:21: error: required directory ./dict does not exist
>
> Makefile.am:21: error: required directory ./classify does not exist
>
> Makefile.am:21: error: required directory ./wordrec does not exist
>
> Makefile.am:21: error: required directory ./textord does not exist
>
> Makefile.am:21: error: required directory ./lstm does not exist
>
> Makefile.am:21: error: required directory ./ccmain does not exist
>
> Makefile.am:21: error: required directory ./api does not exist
>
> Makefile.am:5: error: required directory ./training does not exist
>
> src/api/Makefile.am: installing 'config/depcomp'
>
>
>   Something went wrong, bailing out!
>
>
> bash-3.2#
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/404dc0eb-12b5-4731-8ba2-589c8ac8d346%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8ze3cCP3F5ppfj3ceaTCG_aJS0U1L-WHdg7pLA-iqSWQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] error: required directory

2018-04-25 Thread Marius Amado-Alves
Trying to install on a Mac, cannot pass the autogen.sh step. Any tips 
highly appreciated. Current directory is /tesseract

bash-3.2# ./autogen.sh 

Running aclocal

Running /opt/local/bin/glibtoolize

glibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'config'.

glibtoolize: copying file 'config/ltmain.sh'

glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.

glibtoolize: copying file 'm4/libtool.m4'

glibtoolize: copying file 'm4/ltoptions.m4'

glibtoolize: copying file 'm4/ltsugar.m4'

glibtoolize: copying file 'm4/ltversion.m4'

glibtoolize: copying file 'm4/lt~obsolete.m4'

Running autoheader

Running automake --add-missing --copy

configure.ac:314: installing 'config/compile'

configure.ac:23: installing 'config/missing'

Makefile.am:21: error: required directory ./arch does not exist

Makefile.am:21: error: required directory ./ccutil does not exist

Makefile.am:21: error: required directory ./viewer does not exist

Makefile.am:21: error: required directory ./cutil does not exist

Makefile.am:21: error: required directory ./opencl does not exist

Makefile.am:21: error: required directory ./ccstruct does not exist

Makefile.am:21: error: required directory ./dict does not exist

Makefile.am:21: error: required directory ./classify does not exist

Makefile.am:21: error: required directory ./wordrec does not exist

Makefile.am:21: error: required directory ./textord does not exist

Makefile.am:21: error: required directory ./lstm does not exist

Makefile.am:21: error: required directory ./ccmain does not exist

Makefile.am:21: error: required directory ./api does not exist

Makefile.am:5: error: required directory ./training does not exist

src/api/Makefile.am: installing 'config/depcomp'


  Something went wrong, bailing out!


bash-3.2#


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/404dc0eb-12b5-4731-8ba2-589c8ac8d346%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] tesseract performs wrong auto-correction sometimes : how to disable it?

2018-04-25 Thread ShreeDevi Kumar
Which version of tesseract are you using?

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Apr 25, 2018 at 8:29 PM, Youcef  wrote:

> Hi,
>
>
> Tesseract seems to post process its prediction.
>
> Here after, what I get after OCRizing images (same font, same size images
> generated with text2image):
>
> - an image containing "12345678I" => `123456781`
> - an image containing "GLOTHUVFI" => `GLOTHUVFI`
> - an image containing "12345678H" => `12345678H`
> - an image containing "GLOTHUVFH" => `GLOTHUVFH`
> - an image containing "12345678A" => `123456784`
> - an image containing "GLOTHUVFA" => `GLOTHUVFA`
>
> It looks like Tesseract doesn't like a word with a some numbers and one
> letter at the end. In fact, if the letter looks like a number ("I" and "A"
> looks like "1" and "4" respectively), it replaces it by the closest number.
> I have tried to tune following parameters without any changement in the
> result:
>
> - segment_penalty_dict_frequent_word
> - language_model_penalty_chartype
>
> Thanks for any help.
>
> Regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/4722674d-27a1-4b8e-8c5a-9e07dbe3ca7d%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWDQt2VBHB%2Bhjba4hNMS-nhqEqeZ9T4PgwOZPys3unzmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread Александр Поздняков
for CentOS

> yum-config-manager 
> --add-repo 
> https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
> yum update
> yum install tesseract 


for example

> yum install tesseract-langpack-deu



среда, 25 апреля 2018 г., 16:30:01 UTC+3 пользователь Eugene Huang написал:
>
> Hello Александр!
>
> I took a look at your stuff; it is very extensive. If all the 
> installations work, this should be front-paged! I have never used openSUSE. 
> Could you point me to some resources to figure out how use your 
> installation packages?
>
>
> @shree
> Thanks for the info. I definitely notice that Tesseract 4 is more 
> accurate--more example, Tesseract 4 can read small italics font whereas 
> Tesseract 3 makes lots of mistakes. Seems like Tesseract 4 is the future!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c762d9e-eb1e-436c-80ca-e464a167e5c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread shree
Thanks for the rpm package, Alex. I have added the info 
to https://github.com/tesseract-ocr/tesseract/wiki 

On Tuesday, April 24, 2018 at 10:04:55 PM UTC+5:30, Александр Поздняков 
wrote:
>
> Hi. I compiled an rpm package with tesseract-ocr for CentOS, Fedora, 
> ScientificLinux, OpenSuse. It must be checked...
> https://build.opensuse.org/project/show/home:Alexander_Pozdnyakov
>
> понедельник, 23 апреля 2018 г., 21:22:40 UTC+3 пользователь Eugene Huang 
> написал:
>>
>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and 
>> Windows. Unfortunately, there are no clear instructions on installing 
>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and 
>> Red Hat.
>>
>> After going through dependency hell, I successfully installed Tesseract 4 
>> onto CentOS 7. I presume that the installation script should also work for 
>> Red Hat. I want to give credit to EisenVault because this script is 
>> essentially a modified version of his script. This is my first contribution 
>> to open source software, so any tips will be highly appreciated!
>>
>> When running this script line by line, you probably have to prefix "sudo" 
>> to each line, or you can copy and paste into a bash script and then run 
>> sudo along with the script. I have tested both to work on a fresh image of 
>> CentOS 7 on VirtualBox.
>>
>> Cheers!
>>
>> # (Estimated Time of Completion: 45 minutes)
>> # Instructions taken (and slightly modified) from 
>> https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
>> cd /opt
>> # The following line will take 30 minutes to install.
>> yum -y update 
>> yum -y install libstdc++ autoconf automake libtool autoconf-archive 
>> pkg-config 
>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
>> yum group install -y "Development Tools"
>>
>>
>> # Install Leptonica from Source
>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
>> tar -zxvf leptonica-1.75.3.tar.gz
>> cd leptonica-1.75.3
>> ./autobuild
>> ./configure
>> make -j
>> make install
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Sanity checks
>> # check if libpng is installed: type "whereis libpng" and expect to see a 
>> directory; a blank line is not good
>> # check if leptonica is installed: type "ls /usr/local/include" and 
>> expect to see "leptonica"
>>
>>
>> # Install Tesseract from Source
>> wget https://
>> github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
>> tar -zxvf 4.0.0-beta.1.tar.gz
>> cd tesseract-4.0.0-beta.1/
>> ./autogen.sh
>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
>> LIBLEPT_HEADERSDIR=/usr/local/include 
>> ./configure --with-extra-includes=/usr/local/include --with-extra-
>> libraries=/usr/local/lib
>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
>> make install
>> ldconfig
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Download and install tesseract language files (Tesseract 4 traineddata 
>> files)
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
>> wget https://
>> github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
>> # download another other languages you like
>> mv *.traineddata /usr/local/share/tessdata
>>
>>
>> # Sanity check
>> # check if tesseract is installed: type "tesseract --version" and expect 
>> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for 
>> images)
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b1716158-3a87-4506-b277-0e504c144384%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] tesseract performs wrong auto-correction sometimes : how to disable it?

2018-04-25 Thread Youcef
Hi,


Tesseract seems to post process its prediction.

Here after, what I get after OCRizing images (same font, same size images 
generated with text2image):

- an image containing "12345678I" => `123456781`
- an image containing "GLOTHUVFI" => `GLOTHUVFI`
- an image containing "12345678H" => `12345678H`
- an image containing "GLOTHUVFH" => `GLOTHUVFH`
- an image containing "12345678A" => `123456784`
- an image containing "GLOTHUVFA" => `GLOTHUVFA`

It looks like Tesseract doesn't like a word with a some numbers and one 
letter at the end. In fact, if the letter looks like a number ("I" and "A" 
looks like "1" and "4" respectively), it replaces it by the closest number.
I have tried to tune following parameters without any changement in the 
result:

- segment_penalty_dict_frequent_word
- language_model_penalty_chartype

Thanks for any help.

Regards

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4722674d-27a1-4b8e-8c5a-9e07dbe3ca7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread Zdenko Podobny
Well, you should contact creator of traineddata . We have no clue what they
did..

Zdenko

2018-04-25 14:55 GMT+02:00 :

> Hello there,
>
> i don't know what to do anymore...
> I want to use tesseract-ocr 3.05 for scanning documents, using the font
> "Perfect DOS VGA 437 Win".
> Got a traineddata file for my font from trainyourtesseract.com, actual it
> works really nice but in every case the letter "d" isnt identified but "a"
> or "u" is given out instead  eg  "Gemeinue" instead of "Gemeinde".
>
> Adding my words to freq-dawg didnt change anything.
> I tried also to train tesseract with a new language, using this font, but
> the result is even worse.
> Combine languages perfect+deu effects some right words with the "d" and
> many wrong.
>
> Is there anyone who can help me please?
>
> I'm completely desperate. :-(
>
> Sorry for bad english and best regards
>
> Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/a50cdf35-63ef-4dc8-943b-a8d69c5adc6a%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xpwkCx3azuc9d4bQS-SWMU%2BETfHjgujs9uUuwBYd9kZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread dave . hardy
Hello there, 

i don't know what to do anymore...
I want to use tesseract-ocr 3.05 for scanning documents, using the font 
"Perfect DOS VGA 437 Win".
Got a traineddata file for my font from trainyourtesseract.com, actual it works 
really nice but in every case the letter "d" isnt identified but "a" or "u" is 
given out instead  eg  "Gemeinue" instead of "Gemeinde".

Adding my words to freq-dawg didnt change anything.
I tried also to train tesseract with a new language, using this font, but the 
result is even worse.
Combine languages perfect+deu effects some right words with the "d" and many 
wrong. 

Is there anyone who can help me please? 

I'm completely desperate. :-(

Sorry for bad english and best regards

Dave

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a50cdf35-63ef-4dc8-943b-a8d69c5adc6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread Eugene Huang
Hello Александр!

I took a look at your stuff; it is very extensive. If all the installations 
work, this should be front-paged! I have never used openSUSE. Could you 
point me to some resources to figure out how use your installation packages?


@shree
Thanks for the info. I definitely notice that Tesseract 4 is more 
accurate--more example, Tesseract 4 can read small italics font whereas 
Tesseract 3 makes lots of mistakes. Seems like Tesseract 4 is the future!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/099cf0e9-157f-4f70-8774-0a0f928a3054%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.