I have never used equ.traineddata. From feedback in the forum I don't think
it works very well. Maybe equ has not been trained via LSTM training, I
have no way of knowing. Only Ray Smith or other developers from Google can
answer that.

Only LSTM models exist in tessdata_best and tessdata_fast.

Depending on the language and the hardware that you are running on,
tesseract 4 can be slower than tesseract 3 - see various issues related to
performance on GitHub. However accuracy has improved a lot and a larger
number of languages are available for tesseract 4.

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Apr 24, 2018 at 9:07 PM, Eugene Huang <eugeneh...@gmail.com> wrote:

> @Shree
> Thanks for the tip. Just 2 quick questions.
> 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it
> says that "osd" and "equ" traineddata files are compatible between
> Tesseract 3 and 4. In the GitHub tessdata_fast repo (
> https://github.com/tesseract-ocr/tessdata_fast), "osd" is there with the
> commit "Use legacy Orientation Script Detector (OSD) because that is the
> only thing that currently works." However, "equ" is not in the repo. Was
> this simply a small mistake where the maintainer forgot to include the
> "equ" data file?
>
> 2) Also, with tessdata_fast, I was able to get Tesseract 4 running faster
> than using Tesseract 4 with tessdata. However, is Tesseract 4 supposed to
> be slower than Tesseract 3 because that's what I'm experiencing?
>
>
>
>
> # Here are the updated instructions to download tessdata_fast, which I
> tested to indeed perform faster than tessdata.
> # However, when calling Tesseract from the command line, using the
> arguments "--oem 2" will no longer work.
> # Use "--oem 1" since only the neural net LSTM model exists if using
> tessdata_fast.
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> osd.traineddata?raw=true
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> eng.traineddata?raw=true
> wget https://github.com/tesseract-ocr/tessdata_fast/blob/master/
> chi_sim.traineddata?raw=true
>
>
> On Monday, April 23, 2018 at 2:37:09 PM UTC-4, shree wrote:
>>
>> Thanks for the script to install tesseract on CentOS.
>>
>> I would suggest using traineddata files from tessdata_fast or
>> tessdata_best repos for better accuracy and speed.
>>
>> On Mon 23 Apr, 2018, 11:52 PM Eugene Huang, <eugen...@gmail.com> wrote:
>>
>>> Hello! Most people are probably running Tesseract 4 on Ubuntu, MacOS,
>>> and Windows. Unfortunately, there are no clear instructions on installing
>>> Tesseract 4 for other flavors of Linux--probably most notably CentOS and
>>> Red Hat.
>>>
>>> After going through dependency hell, I successfully installed Tesseract
>>> 4 onto CentOS 7. I presume that the installation script should also work
>>> for Red Hat. I want to give credit to EisenVault because this script is
>>> essentially a modified version of his script. This is my first contribution
>>> to open source software, so any tips will be highly appreciated!
>>>
>>> When running this script line by line, you probably have to prefix
>>> "sudo" to each line, or you can copy and paste into a bash script and then
>>> run sudo along with the script. I have tested both to work on a fresh image
>>> of CentOS 7 on VirtualBox.
>>>
>>> Cheers!
>>>
>>> # (Estimated Time of Completion: 45 minutes)
>>> # Instructions taken (and slightly modified) from
>>> https://github.com/EisenVault/install-tesseract-redhat-cento
>>> s/blob/master/install-tesseract.sh
>>> cd /opt
>>> # The following line will take 30 minutes to install.
>>> yum -y update
>>> yum -y install libstdc++ autoconf automake libtool autoconf-archive 
>>> pkg-config
>>> gcc gcc-c++ make libjpeg-devel libpng-devel libtiff-devel zlib-devel
>>> yum group install -y "Development Tools"
>>>
>>>
>>> # Install Leptonica from Source
>>> wget http://www.leptonica.com/source/leptonica-1.75.3.tar.gz
>>> tar -zxvf leptonica-1.75.3.tar.gz
>>> cd leptonica-1.75.3
>>> ./autobuild
>>> ./configure
>>> make -j
>>> make install
>>> cd ..
>>> # Delete tar.gz file if you like
>>>
>>>
>>> # Sanity checks
>>> # check if libpng is installed: type "whereis libpng" and expect to see
>>> a directory; a blank line is not good
>>> # check if leptonica is installed: type "ls /usr/local/include" and
>>> expect to see "leptonica"
>>>
>>>
>>> # Install Tesseract from Source
>>> wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0-bet
>>> a.1.tar.gz
>>> tar -zxvf 4.0.0-beta.1.tar.gz
>>> cd tesseract-4.0.0-beta.1/
>>> ./autogen.sh
>>> PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
>>> LIBLEPT_HEADERSDIR=/usr/local/include
>>> ./configure --with-extra-includes=/usr/local/include --with-extra-
>>> libraries=/usr/local/lib
>>> LDFLAGS="-L/usr/local/lib" CFLAGS="-I/usr/local/include" make -j
>>> make install
>>> ldconfig
>>> cd ..
>>> # Delete tar.gz file if you like
>>>
>>>
>>> # Download and install tesseract language files (Tesseract 4 traineddata
>>> files)
>>> wget https://github.com/tesseract-ocr/tessdata/raw/master/osd.tra
>>> ineddata
>>> wget https://github.com/tesseract-ocr/tessdata/raw/master/equ.tra
>>> ineddata
>>> wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.tra
>>> ineddata
>>> wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim
>>> .traineddata
>>> # download another other languages you like
>>> mv *.traineddata /usr/local/share/tessdata
>>>
>>>
>>> # Sanity check
>>> # check if tesseract is installed: type "tesseract --version" and expect
>>> to see 1st line (tesseract), 2nd line (leptonica), 3rd line(libraries for
>>> images)
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/d41ebcc5-b3b1-4e66-af8a-c7896814a7cc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/0ad1e94c-92a7-47c5-88d2-1391b6172889%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV9%2BANp%3D%2BhT4aXmGav0dsC4vCPCR9CsYn0bhPxXmO973Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to