I have never used equ.traineddata. From feedback in the forum I don't think it works very well. Maybe equ has not been trained via LSTM training, I have no way of knowing. Only Ray Smith or other developers from Google can answer that.
Only LSTM models exist in tessdata_best and tessdata_fast. Depending on the language and the hardware that you are running on, tesseract 4 can be slower than tesseract 3 - see various issues related to performance on GitHub. However accuracy has improved a lot and a larger number of languages are available for tesseract 4. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 24, 2018 at 9:07 PM, Eugene Huang <eugeneh...@gmail.com> wrote: > @Shree > Thanks for the tip. Just 2 quick questions. > 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it > says that "osd" and "equ" traineddata files are compatible between > Tesseract 3 and 4. In the GitHub tessdata_fast repo ( > https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the > commit "Use legacy Orientation Script Detector (OSD) because that is the > only thing that currently works." However, "equ" is not in the repo. Was > this simply a small mistake where the maintainer forgot to include the > "equ" data file? > > 2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster > than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to > be slower than Tesseract 3 because that's what I'm experiencing? > > > > > # Here are the updated instructions to download tessdata_fast, which I > tested to indeed perform faster than tessdata. > # However, when calling Tesseract from the command line, using the > arguments "--oem 2" will no longer work. > # Use "--oem 1" since only the neural net LSTM model exists if using > tessdata_fast. > wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/ > osd.traineddata?raw=true > wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/ > eng.traineddata?raw=true > wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/ > chi_sim.traineddata?raw=true > > > On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote: >> >> Thanks for the script to install tesseract on CentOS. >> >> I would suggest using traineddata files from tessdata_fast or >> tessdata_best repos for better accuracy and speed. >> >> On Mon 23 Apr, 2018, 11:52 PM Eugene Huang, <eugen...@gmail.com> wrote: >> >>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS, >>> and Windows. Unfortunately, there are no clear instructions on installing >>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and >>> Red Hat. >>> >>> After going through dependency hell, I successfully installed Tesseract >>> 4 onto CentOS 7. I presume that the installation script should also work >>> for Red Hat. I want to give credit to EisenVault because this script is >>> essentially a modified version of his script. This is my first contribution >>> to open source software, so any tips will be highly appreciated! >>> >>> When running this script line by line, you probably have to prefix >>> "sudo" to each line, or you can copy and paste into a bash script and then >>> run sudo along with the script. I have tested both to work on a fresh image >>> of CentOS 7 on VirtualBox. >>> >>> Cheers! >>> >>> # (Estimated Time of Completion: 45 minutes) >>> # Instructions taken (and slightly modified) from >>> https://github.com/EisenVault/install-tesseract-redhat-cento >>> s/blob/master/install-tesseract.sh >>> cd /opt >>> # The following line will take 30 minutes to install. >>> yum -y update >>> yum -y install libstdc++ autoconf automake libtool autoconf-archive >>> pkg-config >>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel >>> yum group install -y "Development Tools" >>> >>> >>> # Install Leptonica from Source >>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz >>> tar -zxvf leptonica-1.75.3.tar.gz >>> cd leptonica-1.75.3 >>> ./autobuild >>> ./configure >>> make -j >>> make install >>> cd .. >>> # Delete tar.gz file if you like >>> >>> >>> # Sanity checks >>> # check if libpng is installed: type "whereis libpng" and expect to see >>> a directory; a blank line is not good >>> # check if leptonica is installed: type "ls /usr/local/include" and >>> expect to see "leptonica" >>> >>> >>> # Install Tesseract from Source >>> wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-bet >>> a.1.tar.gz >>> tar -zxvf 4.0.0-beta.1.tar.gz >>> cd tesseract-4.0.0-beta.1/ >>> ./autogen.sh >>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig >>> LIBLEPT_HEADERSDIR=/usr/local/include >>> ./configure --with-extra-includes=/usr/local/include --with-extra- >>> libraries=/usr/local/lib >>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j >>> make install >>> ldconfig >>> cd .. >>> # Delete tar.gz file if you like >>> >>> >>> # Download and install tesseract language files (Tesseract 4 traineddata >>> files) >>> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.tra >>> ineddata >>> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.tra >>> ineddata >>> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.tra >>> ineddata >>> wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim >>> .traineddata >>> # download another other languages you like >>> mv *.traineddata /usr/local/share/tessdata >>> >>> >>> # Sanity check >>> # check if tesseract is installed: type "tesseract --version" and expect >>> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for >>> images) >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV9%2BANp%3D%2BhT4aXmGav0dsC4vCPCR9CsYn0bhPxXmO973Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.