Re: [tesseract-ocr] How to prepare fonts folder to train from scratch

2020-03-24 Thread Shree Devi Kumar
As far as I know no one has replicated the LSTM training done from scratch by Ray. On Wed, Mar 25, 2020, 01:35 Essam Zaky wrote: > Hi Dears , > > I would like to build *.traindata from scratch specially for English and > Arabic > > So lets talk about English as example > my question how to

[tesseract-ocr] How to prepare fonts folder to train from scratch

2020-03-24 Thread Essam Zaky
Hi Dears , I would like to build *.traindata from scratch specially for English and Arabic So lets talk about English as example my question how to prepare fonts folder? i read the https://github.com/tesseract-ocr/tesseract/blob/master/src/training/language-specific.sh file i found the

Re: [tesseract-ocr] Tesseract not recognizing ancient language's code

2020-03-24 Thread Shree Devi Kumar
Please see https://github.com/Shreeshrii/tesstrain-xsa/blob/master/langdata/latin2unicode.sh It has sed substitution commands for going from transliteration to Unicode for xsa, based on mapping shown in Wikipedia and other web pages. On Mon, Mar 23, 2020, 01:58 Wincent Balin wrote: > Hi

Re: [tesseract-ocr] Help for training Akkadian language for Tesseract 4 needed

2020-03-24 Thread Shree Devi Kumar
> > How comes that all characters appearing are Unicode replacement files? Did > I misconfigure something? > This could be a locale or encoding issue. It needs to be a unicode text file, I open in notepad++ in windows10, encode in utf-8. I run training on a ubuntu machine remotely. > > Is the

[tesseract-ocr] Problems with pdf out put from tesseract

2020-03-24 Thread che
Hello, i am using the following version of the software: tesseract 4.0.0 leptonica-1.76.0 libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX512BW Found AVX512F Found AVX2 Found AVX Found SSE I try to convert .tif in to PDF within a