Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread ShreeDevi Kumar
I have never used equ.traineddata. From feedback in the forum I don't think
it works very well. Maybe equ has not been trained via LSTM training, I
have no way of knowing. Only Ray Smith or other developers from Google can
answer that.

Only LSTM models exist in tessdata_best and tessdata_fast.

Depending on the language and the hardware that you are running on,
tesseract 4 can be slower than tesseract 3 - see various issues related to
performance on GitHub. However accuracy has improved a lot and a larger
number of languages are available for tesseract 4.

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 24, 2018 at 9:07 PM, Eugene Huang  wrote:

> @Shree
> Thanks for the tip. Just 2 quick questions.
> 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it
> says that "osd" and "equ" traineddata files are compatible between
> Tesseract 3 and 4. In the GitHub tessdata_fast repo (
> https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the
> commit "Use legacy Orientation Script Detector (OSD) because that is the
> only thing that currently works." However, "equ" is not in the repo. Was
> this simply a small mistake where the maintainer forgot to include the
> "equ" data file?
>
> 2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster
> than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to
> be slower than Tesseract 3 because that's what I'm experiencing?
>
>
>
>
> # Here are the updated instructions to download tessdata_fast, which I
> tested to indeed perform faster than tessdata.
> # However, when calling Tesseract from the command line, using the
> arguments "--oem 2" will no longer work.
> # Use "--oem 1" since only the neural net LSTM model exists if using
> tessdata_fast.
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> osd.traineddata?raw=true
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> eng.traineddata?raw=true
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> chi_sim.traineddata?raw=true
>
>
> On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote:
>>
>> Thanks for the script to install tesseract on CentOS.
>>
>> I would suggest using traineddata files from tessdata_fast or
>> tessdata_best repos for better accuracy and speed.
>>
>> On Mon 23 Apr, 2018, 11:52 PM Eugene Huang,  wrote:
>>
>>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS,
>>> and Windows. Unfortunately, there are no clear instructions on installing
>>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and
>>> Red Hat.
>>>
>>> After going through dependency hell, I successfully installed Tesseract
>>> 4 onto CentOS 7. I presume that the installation script should also work
>>> for Red Hat. I want to give credit to EisenVault because this script is
>>> essentially a modified version of his script. This is my first contribution
>>> to open source software, so any tips will be highly appreciated!
>>>
>>> When running this script line by line, you probably have to prefix
>>> "sudo" to each line, or you can copy and paste into a bash script and then
>>> run sudo along with the script. I have tested both to work on a fresh image
>>> of CentOS 7 on VirtualBox.
>>>
>>> Cheers!
>>>
>>> # (Estimated Time of Completion: 45 minutes)
>>> # Instructions taken (and slightly modified) from
>>> https://github.com/EisenVault/install-tesseract-redhat-cento
>>> s/blob/master/install-tesseract.sh
>>> cd /opt
>>> # The following line will take 30 minutes to install.
>>> yum -y update
>>> yum -y install libstdc++ autoconf automake libtool autoconf-archive 
>>> pkg-config
>>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
>>> yum group install -y "Development Tools"
>>>
>>>
>>> # Install Leptonica from Source
>>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
>>> tar -zxvf leptonica-1.75.3.tar.gz
>>> cd leptonica-1.75.3
>>> ./autobuild
>>> ./configure
>>> make -j
>>> make install
>>> cd ..
>>> # Delete tar.gz file if you like
>>>
>>>
>>> # Sanity checks
>>> # check if libpng is installed: type "whereis libpng" and expect to see
>>> a directory; a blank line is not good
>>> # check if leptonica is installed: type "ls /usr/local/include" and
>>> expect to see "leptonica"
>>>
>>>
>>> # Install Tesseract from Source
>>> wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-bet
>>> a.1.tar.gz
>>> tar -zxvf 4.0.0-beta.1.tar.gz
>>> cd tesseract-4.0.0-beta.1/
>>> ./autogen.sh
>>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
>>> LIBLEPT_HEADERSDIR=/usr/local/include
>>> ./configure --with-extra-includes=/usr/local/include --with-extra-
>>> libraries=/usr/local/lib
>>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
>>> make install
>>> ldconfig
>>> cd ..
>>> # Delete tar.gz file if you like
>>>
>>>
>>> # 

Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread Eugene Huang
@Shree
Thanks for the tip. Just 2 quick questions. 
1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says 
that "osd" and "equ" traineddata files are compatible between Tesseract 3 
and 4. In the GitHub tessdata_fast repo 
(https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the 
commit "Use legacy Orientation Script Detector (OSD) because that is the 
only thing that currently works." However, "equ" is not in the repo. Was 
this simply a small mistake where the maintainer forgot to include the 
"equ" data file?

2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster 
than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to 
be slower than Tesseract 3 because that's what I'm experiencing?




# Here are the updated instructions to download tessdata_fast, which I 
tested to indeed perform faster than tessdata.
# However, when calling Tesseract from the command line, using the 
arguments "--oem 2" will no longer work. 
# Use "--oem 1" since only the neural net LSTM model exists if using 
tessdata_fast.
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/osd.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/chi_sim.traineddata?raw=true

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/47f3b497-84fb-4aed-9766-877053e8a293%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread Eugene Huang
@Shree
Thanks for the tip. Just 2 quick questions. 
1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says 
that "osd" and "equ" traineddata files are compatible between Tesseract 3 
and 4. In the GitHub tessdata_fast repo 
(https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the 
commit "Use legacy Orientation Script Detector (OSD) because that is the 
only thing that currently works." However, "equ" is not in the repo. Was 
this simply a small mistake where the maintainer forgot to include the 
"equ" data file?

2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster 
than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to 
be slower than Tesseract 3 because that's what I'm experiencing?




# Here are the updated instructions to download tessdata_fast, which I 
tested to indeed perform faster than tessdata.
# However, when calling Tesseract from the command line, using the 
arguments "--oem 2" will no longer work. 
# Use "--oem 1" since only the neural net LSTM model exists if using 
tessdata_fast.
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/osd.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata?raw=true
wget 
https://github.com/tesseract-ocr/tessdata_fast/blob/master/chi_sim.traineddata?raw=true


On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote:
>
> Thanks for the script to install tesseract on CentOS.
>
> I would suggest using traineddata files from tessdata_fast or 
> tessdata_best repos for better accuracy and speed.
>
> On Mon 23 Apr, 2018, 11:52 PM Eugene Huang,  > wrote:
>
>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and 
>> Windows. Unfortunately, there are no clear instructions on installing 
>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and 
>> Red Hat.
>>
>> After going through dependency hell, I successfully installed Tesseract 4 
>> onto CentOS 7. I presume that the installation script should also work for 
>> Red Hat. I want to give credit to EisenVault because this script is 
>> essentially a modified version of his script. This is my first contribution 
>> to open source software, so any tips will be highly appreciated!
>>
>> When running this script line by line, you probably have to prefix "sudo" 
>> to each line, or you can copy and paste into a bash script and then run 
>> sudo along with the script. I have tested both to work on a fresh image of 
>> CentOS 7 on VirtualBox.
>>
>> Cheers!
>>
>> # (Estimated Time of Completion: 45 minutes)
>> # Instructions taken (and slightly modified) from 
>> https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
>> cd /opt
>> # The following line will take 30 minutes to install.
>> yum -y update 
>> yum -y install libstdc++ autoconf automake libtool autoconf-archive 
>> pkg-config 
>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
>> yum group install -y "Development Tools"
>>
>>
>> # Install Leptonica from Source
>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
>> tar -zxvf leptonica-1.75.3.tar.gz
>> cd leptonica-1.75.3
>> ./autobuild
>> ./configure
>> make -j
>> make install
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Sanity checks
>> # check if libpng is installed: type "whereis libpng" and expect to see a 
>> directory; a blank line is not good
>> # check if leptonica is installed: type "ls /usr/local/include" and 
>> expect to see "leptonica"
>>
>>
>> # Install Tesseract from Source
>> wget https://
>> github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
>> tar -zxvf 4.0.0-beta.1.tar.gz
>> cd tesseract-4.0.0-beta.1/
>> ./autogen.sh
>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
>> LIBLEPT_HEADERSDIR=/usr/local/include 
>> ./configure --with-extra-includes=/usr/local/include --with-extra-
>> libraries=/usr/local/lib
>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
>> make install
>> ldconfig
>> cd ..
>> # Delete tar.gz file if you like
>>
>>
>> # Download and install tesseract language files (Tesseract 4 traineddata 
>> files)
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
>> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
>> wget https://
>> github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
>> # download another other languages you like
>> mv *.traineddata /usr/local/share/tessdata
>>
>>
>> # Sanity check
>> # check if tesseract is installed: type "tesseract --version" and expect 
>> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for 
>> images)
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 

Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-23 Thread ShreeDevi Kumar
Thanks for the script to install tesseract on CentOS.

I would suggest using traineddata files from tessdata_fast or tessdata_best
repos for better accuracy and speed.

On Mon 23 Apr, 2018, 11:52 PM Eugene Huang,  wrote:

> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and
> Windows. Unfortunately, there are no clear instructions on installing
> Tesseract 4 for other flavors of Linux--probably most notably CentOS and
> Red Hat.
>
> After going through dependency hell, I successfully installed Tesseract 4
> onto CentOS 7. I presume that the installation script should also work for
> Red Hat. I want to give credit to EisenVault because this script is
> essentially a modified version of his script. This is my first contribution
> to open source software, so any tips will be highly appreciated!
>
> When running this script line by line, you probably have to prefix "sudo"
> to each line, or you can copy and paste into a bash script and then run
> sudo along with the script. I have tested both to work on a fresh image of
> CentOS 7 on VirtualBox.
>
> Cheers!
>
> # (Estimated Time of Completion: 45 minutes)
> # Instructions taken (and slightly modified) from
> https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
> cd /opt
> # The following line will take 30 minutes to install.
> yum -y update
> yum -y install libstdc++ autoconf automake libtool autoconf-archive pkg-config
> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
> yum group install -y "Development Tools"
>
>
> # Install Leptonica from Source
> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
> tar -zxvf leptonica-1.75.3.tar.gz
> cd leptonica-1.75.3
> ./autobuild
> ./configure
> make -j
> make install
> cd ..
> # Delete tar.gz file if you like
>
>
> # Sanity checks
> # check if libpng is installed: type "whereis libpng" and expect to see a
> directory; a blank line is not good
> # check if leptonica is installed: type "ls /usr/local/include" and expect
> to see "leptonica"
>
>
> # Install Tesseract from Source
> wget https://
> github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
> tar -zxvf 4.0.0-beta.1.tar.gz
> cd tesseract-4.0.0-beta.1/
> ./autogen.sh
> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include
> ./configure --with-extra-includes=/usr/local/include --with-extra-
> libraries=/usr/local/lib
> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
> make install
> ldconfig
> cd ..
> # Delete tar.gz file if you like
>
>
> # Download and install tesseract language files (Tesseract 4 traineddata
> files)
> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
> wget https://
> github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
> # download another other languages you like
> mv *.traineddata /usr/local/share/tessdata
>
>
> # Sanity check
> # check if tesseract is installed: type "tesseract --version" and expect
> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for
> images)
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUtn3-BLdzi-Sx2tKVpLyKWGXPZt6%2BvOVd1EJdP1K4SnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-23 Thread Eugene Huang
Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and 
Windows. Unfortunately, there are no clear instructions on installing 
Tesseract 4 for other flavors of Linux--probably most notably CentOS and 
Red Hat.

After going through dependency hell, I successfully installed Tesseract 4 
onto CentOS 7. I presume that the installation script should also work for 
Red Hat. I want to give credit to EisenVault because this script is 
essentially a modified version of his script. This is my first contribution 
to open source software, so any tips will be highly appreciated!

When running this script line by line, you probably have to prefix "sudo" 
to each line, or you can copy and paste into a bash script and then run 
sudo along with the script. I have tested both to work on a fresh image of 
CentOS 7 on VirtualBox.

Cheers!

# (Estimated Time of Completion: 45 minutes)
# Instructions taken (and slightly modified) from 
https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
cd /opt
# The following line will take 30 minutes to install.
yum -y update 
yum -y install libstdc++ autoconf automake libtool autoconf-archive pkg-config 
gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
yum group install -y "Development Tools"


# Install Leptonica from Source
wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
tar -zxvf leptonica-1.75.3.tar.gz
cd leptonica-1.75.3
./autobuild
./configure
make -j
make install
cd ..
# Delete tar.gz file if you like


# Sanity checks
# check if libpng is installed: type "whereis libpng" and expect to see a 
directory; a blank line is not good
# check if leptonica is installed: type "ls /usr/local/include" and expect 
to see "leptonica"


# Install Tesseract from Source
wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
tar -zxvf 4.0.0-beta.1.tar.gz
cd tesseract-4.0.0-beta.1/
./autogen.sh
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include 
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=
/usr/local/lib
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
make install
ldconfig
cd ..
# Delete tar.gz file if you like


# Download and install tesseract language files (Tesseract 4 traineddata 
files)
wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget https:
//github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
# download another other languages you like
mv *.traineddata /usr/local/share/tessdata


# Sanity check
# check if tesseract is installed: type "tesseract --version" and expect to 
see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for 
images)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.