Thanks for the script to install tesseract on CentOS. I would suggest using traineddata files from tessdata_fast or tessdata_best repos for better accuracy and speed.
On Mon 23 Apr, 2018, 11:52 PM Eugene Huang, <eugeneh...@gmail.com> wrote: > Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, and > Windows. Unfortunately, there are no clear instructions on installing > Tesseract 4 for other flavors of Linux--probably most notably CentOS and > Red Hat. > > After going through dependency hell, I successfully installed Tesseract 4 > onto CentOS 7. I presume that the installation script should also work for > Red Hat. I want to give credit to EisenVault because this script is > essentially a modified version of his script. This is my first contribution > to open source software, so any tips will be highly appreciated! > > When running this script line by line, you probably have to prefix "sudo" > to each line, or you can copy and paste into a bash script and then run > sudo along with the script. I have tested both to work on a fresh image of > CentOS 7 on VirtualBox. > > Cheers! > > # (Estimated Time of Completion: 45 minutes) > # Instructions taken (and slightly modified) from > https://github.com/EisenVault/install-tesseract-redhat-centos/blob/master/install-tesseract.sh > cd /opt > # The following line will take 30 minutes to install. > yum -y update > yum -y install libstdc++ autoconf automake libtool autoconf-archive pkg-config > gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel > yum group install -y "Development Tools" > > > # Install Leptonica from Source > wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz > tar -zxvf leptonica-1.75.3.tar.gz > cd leptonica-1.75.3 > ./autobuild > ./configure > make -j > make install > cd .. > # Delete tar.gz file if you like > > > # Sanity checks > # check if libpng is installed: type "whereis libpng" and expect to see a > directory; a blank line is not good > # check if leptonica is installed: type "ls /usr/local/include" and expect > to see "leptonica" > > > # Install Tesseract from Source > wget https:// > github.com/tesseract-ocr/tesseract/archive/4.0.0-beta.1.tar.gz > tar -zxvf 4.0.0-beta.1.tar.gz > cd tesseract-4.0.0-beta.1/ > ./autogen.sh > PKG_CONFIG_PATH=/usr/local/lib/pkgconfig LIBLEPT_HEADERSDIR=/usr/local/include > ./configure --with-extra-includes=/usr/local/include --with-extra- > libraries=/usr/local/lib > LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j > make install > ldconfig > cd .. > # Delete tar.gz file if you like > > > # Download and install tesseract language files (Tesseract 4 traineddata > files) > wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.traineddata > wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.traineddata > wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata > wget https:// > github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata > # download another other languages you like > mv *.traineddata /usr/local/share/tessdata > > > # Sanity check > # check if tesseract is installed: type "tesseract --version" and expect > to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for > images) > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUtn3-BLdzi-Sx2tKVpLyKWGXPZt6%2BvOVd1EJdP1K4SnA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.