Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and 
Windows. Unfortunately, there are no clear instructions on installing 
Tesseract 4 for other flavors of Linux--probably most notably CentOS and 
Red Hat.

After going through dependency hell, I successfully installed Tesseract 4 
onto CentOS 7. I presume that the installation script should also work for 
Red Hat. I want to give credit to EisenVault because this script is 
essentially a modified version of his script. This is my first contribution 
to open source software, so any tips will be highly appreciated!

When running this script line by line, you probably have to prefix "sudo" 
to each line, or you can copy and paste into a bash script and then run 
sudo along with the script. I have tested both to work on a fresh image of 
CentOS 7 on VirtualBox.

Cheers!

# (Estimated Time of Completion: 45 minutes)
# Instructions taken (and slightly modified) from 
https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh
cd /opt
# The following line will take 30 minutes to install.
yum -y update 
yum -y install libstdc++ autoconf automake libtool autoconf-archive pkg-config 
gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
yum group install -y "Development Tools"


# Install Leptonica from Source
wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
tar -zxvf leptonica-1.75.3.tar.gz
cd leptonica-1.75.3
./autobuild
./configure
make -j
make install
cd ..
# Delete tar.gz file if you like


# Sanity checks
# check if libpng is installed: type "whereis libpng" and expect to see a 
directory; a blank line is not good
# check if leptonica is installed: type "ls /usr/local/include" and expect 
to see "leptonica"


# Install Tesseract from Source
wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz
tar -zxvf 4.0.0-beta.1.tar.gz
cd tesseract-4.0.0-beta.1/
./autogen.sh
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include 
./configure --with-extra-includes=/usr/local/include --with-extra-libraries=
/usr/local/lib
LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
make install
ldconfig
cd ..
# Delete tar.gz file if you like


# Download and install tesseract language files (Tesseract 4 traineddata 
files)
wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
wget https:
//github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
# download another other languages you like
mv *.traineddata /usr/local/share/tessdata


# Sanity check
# check if tesseract is installed: type "tesseract --version" and expect to 
see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for 
images)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to