Re: [tesseract-ocr] Looking for a book pages pictures database

2016-09-27 Thread ShreeDevi Kumar
Have you seen https://github.com/tesseract-ocr/tesseract/wiki/TestingTesseract On 26 Sep 2016 11:28 p.m., "Pedro Correia" wrote: > Hi there, I'm currently testing different custom thresholding methods with > tesseract and I need a database of book pages pictures in

Re: [tesseract-ocr] Training a new font with tesstrain.sh failed at phase M

2016-09-26 Thread ShreeDevi Kumar
Are you trying to train for English language with an Arabic font? On 27 Sep 2016 10:01 a.m., "Kiều Vương" wrote: > - I need to train a new font on Ubuntu 14.04, tesseract 3.0.5, > leptonica-1.73 > - I had prepared: .tif file of font, font_properties text file and follow

Re: [tesseract-ocr] Shapeclustering crashes on linux

2016-09-22 Thread ShreeDevi Kumar
Training can take long and can crash again later too. You can try training without shape clustering also, mftraining I think will create a flat shape table in that case. You can compare both. On 22 Sep 2016 7:24 p.m., "rkvsraman" wrote: > Hi, > > Shapeclustering doesn't

Re: [tesseract-ocr] Shapeclustering crashes on linux

2016-09-22 Thread ShreeDevi Kumar
>From readme.md of langdata To re-create the training of a single language, lang, you need the following: All the data in the lang directory. san/*.* The corresponding unicharset/xheights files for the script(s) used by lang. Devanagari.* All the remaining non-lang-specific files in the

Re: [tesseract-ocr] Shapeclustering crashes on linux

2016-09-22 Thread ShreeDevi Kumar
Warning: properties incomplete for index 93 = प्र Warning: properties incomplete for index 94 = क्रि Warning: properties incomplete for index 95 = २ Warning: properties incomplete for index 96 = ५ These errors will get eliminated / reduced if your langdata has the Devanagari.unicharset and

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-21 Thread ShreeDevi Kumar
For the two trainings uploaded in imagessan, I used commandline, with tesstrain.sh shell script. For GUI, I use VietOCR. http://vietocr.sourceforge.net/ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Sep 21,

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-21 Thread ShreeDevi Kumar
Also see the san.config file in the langdata directory ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Sep 21, 2016 at 2:28 PM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > I had used the new t

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-21 Thread ShreeDevi Kumar
ntries in unicharset. > > How did you manage to get 1645 entries with it? > > > > Best Regards > -Raman > > > > On Wed, Sep 21, 2016 at 9:21 AM, ShreeDevi Kumar <shreesh...@gmail.com> > wrote: > >> For Sanskrit, please see https://github.com/Shree

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-20 Thread ShreeDevi Kumar
For Sanskrit, please see https://github.com/Shreeshrii/imagessan where I have added the training sources as well as traineddata for two versions of training. In the testing I did on a small sample of images, it seemed to perform better than the 3.04 san.traineddata. You are welcome to try using

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-20 Thread ShreeDevi Kumar
Hindi with cube model was included with version 3.02 (or 3.01). Marathi and Sanskrit tessdata without cube model were released as part of version 3.04. While there has been talk of cube model being experimental (scant information is available for it) and plans for it to be discontinued, 3.04 did

Re: [tesseract-ocr] Warning in pixReadMemJpeg: work-around: writing to a temp file Error

2016-09-02 Thread ShreeDevi Kumar
This is probably warning msg from leptonica. Please check for out.txt, it maybe created. - sent from my phone. excuse the brevity. On 31-Aug-2016 5:29 PM, "Onder Boydak" wrote: > Hi > > I am using Tesseract OCR on Mac Osx Yosemite. > > Tesseract command is giving the

Re: [tesseract-ocr] tesseract installation

2016-09-01 Thread ShreeDevi Kumar
You need to choose the correct option from {i686,x86_64} for your system Try pacman -S mingw-w64-tesseract-ocr-osd mingw-w64-x86_64-tesseract-ocr-eng ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Sep 1, 2016

Re: [tesseract-ocr] Re: 1 Text2Image.exe binary please?

2016-08-28 Thread ShreeDevi Kumar
Gautam, That is incorrect. tesseract.exe is the program for doing OCR - it converts an image to text. text2image.exe is just the opposite. It takes text and converts it to images (tiff files) using different fonts. It is used to creating training data - box/tiff pairs, for tesseract. As Quan

Re: [tesseract-ocr] First version of tesseract4java -- a GUI for training and running Tesseract -- released

2016-08-20 Thread ShreeDevi Kumar
Is this similar to Quan's JTessBoxEditor and VietOCR? I downloaded tesseract4java-0.1.0-windows-x86_64.jar and tried to run it - it gives fatal error - no jnilept in java.library.path ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] tessdata on github

2016-08-19 Thread ShreeDevi Kumar
On 19/08/2016 00:22, ShreeDevi Kumar wrote: > >> I am wondering whether it would be possible to download only the needed >> traineddata files from tessdata repo (optional) into the designated >> tessdata-dir (which has the required tessdata files). >> >> I found the followi

Re: [tesseract-ocr] tessdata on github

2016-08-18 Thread ShreeDevi Kumar
Someone more familiar with git and github can suggest whether submodules would be a good option for langdata and tessdata, https://git-scm.com/book/en/v2/Git-Tools-Submodules ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] tessdata on github

2016-08-18 Thread ShreeDevi Kumar
I am wondering whether it would be possible to download only the needed traineddata files from tessdata repo (optional) into the designated tessdata-dir (which has the required tessdata files). I found the following options but haven't been able to try them out yet .. 1.

Re: [tesseract-ocr] tess 3.04.01 TOOLS install on centos 6.8

2016-07-22 Thread ShreeDevi Kumar
sudo apt-get install libicu-dev # (if you plan to make the training tools) sudo apt-get install libpango1.0-dev # (if you plan to make the training tools) sudo apt-get install libcairo2-dev # (if you plan to make - sent from my phone. excuse the brevity. On 22-Jul-2016 6:47 PM, "Gary Evensen"

Re: [tesseract-ocr] Unable to recognise the text with the traineddata

2016-07-22 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jul 20, 2016 at 6:37 PM, koushik v wrote: > Hi, > > I took a screenshot of

Re: [tesseract-ocr] Can any feature of tesseract auto detect language (or majority language) of the image?

2016-07-19 Thread ShreeDevi Kumar
Try ocr in google drive, it auto detects the languages. - sent from my phone. excuse the brevity. On 19-Jul-2016 12:17 PM, "Ashish Goel" wrote: > I have 100s of images in different languages that I need to OCR. > Presently, I need to know in advance the language of the

Re: [tesseract-ocr] Font_properties file

2016-07-18 Thread ShreeDevi Kumar
You need to set the values based on the font eg. If it is a fixed width font, then set that to 1. If it is a serif font then set that to 1, for sans serif, set it to 0. Similarly if your font is bold or italic, set the corresponding values to 1. So, you need to set values based on the American

Re: [tesseract-ocr] Font_properties file

2016-07-18 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/langdata/blob/master/font_properties ITC_American_Typewriter_Std_Bold 0 1 0 1 0 ITC_American_Typewriter_Std_Bold_Condensed 0 1 0 1 0 ITC_American_Typewriter_Std_Bold_Italic 1 1 0 1 0 ITC_American_Typewriter_Std_Condensed 0 0 0 1 0

Re: [tesseract-ocr] failed to load font_properties from font_properties

2016-07-17 Thread ShreeDevi Kumar
Also check... When running mftraining, each fontname field in the *.tr file must match an fontname entry in the font_properties file, or mftraining will abort. - sent from my phone. excuse the brevity. On

Re: [tesseract-ocr] failed to load font_properties from font_properties

2016-07-17 Thread ShreeDevi Kumar
Since you are using latest source, why don't you use tesstrain.sh for training. Just change the list of fonts for English to the the one you want to use. - sent from my phone. excuse the brevity. On 17-Jul-2016 4:10 PM, "koushik v" wrote: > yeah tried..not working > > On

Re: [tesseract-ocr] failed to load font_properties from font_properties

2016-07-16 Thread ShreeDevi Kumar
Make sure that file has an empty line at end - sent from my phone. excuse the brevity. On 16-Jul-2016 11:05 PM, "koushik v" wrote: > Hi, > > I am following the tesseract ocr guide for training and i am stuck on the > font_properties step.I repeatedly get error on the

Re: [tesseract-ocr] Das tutorial 2016

2016-07-14 Thread ShreeDevi Kumar
Thanks. I was trying download to my phone. Will try with raw file on pc. - sent from my phone. excuse the brevity. On 13-Jul-2016 10:58 PM, "Marco Atzeri" <marco.atz...@gmail.com> wrote: > On 13/07/2016 19:05, ShreeDevi Kumar wrote: > >> https://github.com/te

[tesseract-ocr] Das tutorial 2016

2016-07-13 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/docs/tree/master/das_tutorial2016 I tried to download the files from above link but getting an error. Wondering whether problem is at my end or the files need to be re uploaded... - sent from my phone. excuse the brevity. -- You received this message because

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-30 Thread ShreeDevi Kumar
unable to read cube language model >>>> params from /usr/local/share/tessdata/san3ds.cube.lm >>>> Cube ERROR (CubeRecoContext::Create): unable to init CubeRecoContext >>>> object >>>> init_cube_objects(true, _manager):Error:Assert failed:in file >>>>

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-18 Thread ShreeDevi Kumar
generating cube-word-dawg. > > Thanks in advance > Rohit > > > On Mon, Jun 13, 2016 at 7:04 PM, ShreeDevi Kumar <shreesh...@gmail.com> > wrote: > >> If you look at the readme files in the diff subdirectories starting with >> OCR under >> https://githu

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-15 Thread ShreeDevi Kumar
is is happening? Is it wrong in renaming >> word-dawg, I cannot find any separate option for generating cube-word-dawg. >> >> Thanks in advance >> Rohit >> >> >> On Mon, Jun 13, 2016 at 7:04 PM, ShreeDevi Kumar <shreesh...@gmail.com> >> wrote: >>

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-13 Thread ShreeDevi Kumar
not used it for actual OCR of any text because sanskritocr software by dr. Oliver hellwig gives better results. See https://sites.google.com/site/sanskritcode/ocr/1-ocr-ing - sent from my phone. excuse the brevity. On 13-Jun-2016 6:53 pm, "ShreeDevi Kumar" <shreesh...@gmail.com&g

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-12 Thread ShreeDevi Kumar
Google has not provided images and box files for San.traineddata released for 3.04 I tried training using text2image with a combination of different fonts and training text. Results are at https://github.com/Shreeshrii/imagessan/tree/master/tessdata You can give these a try to see if recognition

Re: [tesseract-ocr] Do we have Sanskrit training images and box files online?

2016-05-14 Thread ShreeDevi Kumar
I have some old files at https://sourceforge.net/projects/tesseracthindi/files/?source=navbar There are newer traineddata files at https://github.com/Shreeshrii/imagessan _ have not uploaded the box files there. if you have training tools installed, you can use the text2image program with the

Re: [tesseract-ocr] how to compile tesseract for windows on a linux machine?

2016-05-14 Thread ShreeDevi Kumar
There is an archlinux distribution for tesseract - see https://www.archlinux.org/packages/community/i686/tesseract/ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, May 11, 2016 at 3:31 PM, Simon Eigeldinger

Re: [tesseract-ocr] Empty Result

2016-04-29 Thread ShreeDevi Kumar
Preprocess the image adding a whitespace margin before running tesseract. Also read recent posts where similar issue was discussed. - sent from my phone. excuse the brevity. On 29-Apr-2016 2:51 pm, "Odun Adeboye" wrote: > Hello, > > I got empty result on the attached image

Re: [tesseract-ocr] Proper Use of Text2Image?

2016-04-17 Thread ShreeDevi Kumar
Please use the font name instead of the TTF file name. It may be 'Bradley Hand ITC' - sent from my phone. excuse the brevity. On 17-Apr-2016 8:38 pm, "John Timuty" wrote: > Hi there! ^_^ > I didn't know how to compile so I had to download Cygwin because only > there i got

[tesseract-ocr] unicharambigs for complex scripts

2016-04-16 Thread ShreeDevi Kumar
Hi, I am finding in case of devanagari thst sometimes the dependent vowel (maatraa) follows the consonant with a space in between. Is there a way in unicharambigs to have the space and maatraa be replaced mandatorily just by the maatraa eg. 1 ः 1 ः 1 2 म ः 1 मः 1 2 ा ः 1 ाः 1 Are the above

[tesseract-ocr] das 2016

2016-04-15 Thread ShreeDevi Kumar
Any update on Ray's tutorial on tesseract-ocr at DAS 2016 http://www.primaresearch.org/das2016/assets/DAS2016_Tutorial_Tesseract.pdf ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because

Re: [tesseract-ocr] unicharset_extractor: command not found

2016-04-11 Thread ShreeDevi Kumar
Please see https://trac.macports.org/browser/trunk/dports/textproc/tesseract/Portfile Looks like macports does not include training tools ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Apr 11, 2016 at 7:48 AM,

Re: [tesseract-ocr] Re: Gujarati OCR

2016-04-06 Thread ShreeDevi Kumar
You can use tesseract with gujarati traineddata or try it with Vietocr GUI. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 5, 2016 at 11:16 PM, Pravin Kothari wrote: > Does this

Re: [tesseract-ocr] How to train tesseract in Windows?

2016-04-04 Thread ShreeDevi Kumar
Install cygwin and download tesseract packages including training utils. >>On cygwin Marco Atzeri has packaged Tesseract as well as the training utilities for 3.04.00 along with some training data. Instruction for cygwin installation is here: https://cygwin.com/cygwin-ug-net/setup-net.html

[tesseract-ocr] building training tools on cygwin

2016-03-29 Thread ShreeDevi Kumar
Hi, I have been able to build latest source of tesseract on cygwin. ra@Shree ~/tesseract-ocr/tesseract $ tesseract -v tesseract 3.05.00dev-296-g60176fc leptonica-1.73 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 However, make

Re: [tesseract-ocr] Re: Segmentation fault 3.04.01

2016-03-15 Thread ShreeDevi Kumar
tesseract --list-langs --tessdata-dir /usr/local/share/ ​Try specifying the directory in the command line. I have tessdata in two different places and I can list them as follows: User@HP MINGW32 ~/tesseract-ocr $ tesseract --list-langs --tessdata-dir ./ List of available languages (17): ara

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-09 Thread ShreeDevi Kumar
wrote: > Thanks for the info. > might have a look at this. > > greetings, > simon > > > Am 05.03.2016 um 05:09 schrieb ShreeDevi Kumar: > >> >> https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-tesseract-ocr/PKGBUILD >> >> Modi

Re: [tesseract-ocr] Updated: tesseract-ocr-3.04.01-1

2016-03-05 Thread ShreeDevi Kumar
Release Notes https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-feb-16-2016---v30401 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 5, 2016 at 2:35 PM, Marco Atzeri

[tesseract-ocr] 3.04.01 Release

2016-03-05 Thread ShreeDevi Kumar
> > ​​https://github.com/tesseract-ocr/tesseract/releases/tag/3.04.01 > > Latest release 3.04.01 255c31f 3.04.01 release > @zdenop zdenop released this 17 days ago · 262 commits to master since > this release > bug-fix release of 3.04 version ​Is there a process for announcing new

Re: [tesseract-ocr] Update of cygwin package for training

2016-03-05 Thread ShreeDevi Kumar
Hi Marco, ​https://github.com/tesseract-ocr/tesseract/releases/tag/3.04.01 > ​Please update cygwin with the new release. Thanks! ​ > ​ > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-04 Thread ShreeDevi Kumar
eDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 5, 2016 at 10:35 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote: > I had compiled it a couple of years back but do not have all the details > now. > >

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-04 Thread ShreeDevi Kumar
esseract-ocr/tesseract/archive/master.zip or whichever release you want to use before running makepkg ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 5, 2016 at 9:39 AM, ShreeDevi Kumar <shreesh...@gmail.com> wrote:

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-04 Thread ShreeDevi Kumar
https://github.com/Alexpux/MINGW-packages/blob/master/mingw-w64-tesseract-ocr/PKGBUILD Modify the pkgbuild to use the latest source. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 5, 2016 at 1:38 AM, Simon

Re: [tesseract-ocr] Compiling Tesseract

2016-02-25 Thread ShreeDevi Kumar
See https://github.com/charlesw/tesseract He has used visual studio 2015 - sent from my phone. excuse the brevity. On 25-Feb-2016 9:05 pm, wrote: > Hello everyone, > > I was trying to install and compiling Tesseract first in Visual Studio > 2013 Express for Windows and

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-30 Thread ShreeDevi Kumar
On cygwin Marco Atzeri has packaged Tesseract as well as the training utilities for 3.04.00 along with some training data. Instruction for cygwin installation is here: https://cygwin.com/cygwin-ug-net/setup-net.html Tesseract specific packages to be installed: tesseract-ocr

Re: [tesseract-ocr] Re: Multiple folders

2015-12-22 Thread ShreeDevi Kumar
See www.tristancollins.me/computing/ocr-using-tesseract-on-multipage-pdfs/ You can setup nested for loops, outer one for folders for the books and the inner one to handle the pages within the book. - sent from my phone. excuse the brevity. I would really like to do it in linux command line. So,

Re: [tesseract-ocr] Re: Multiple folders

2015-12-22 Thread ShreeDevi Kumar
Also see www.morethantechnical.com/2013/11/21/creating-a-searchable-pdf-with-opensource-tools-ghostscript-hocr2pdf-and-tesseract-ocr/ - sent from my phone. excuse the brevity. On 22-Dec-2015 1:42 pm, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > See www.tristancollins.me/

Re: [tesseract-ocr] how to use tesstrain .sh etc in ubuntu 15.10

2015-11-28 Thread ShreeDevi Kumar
Great to hear that you successfully generated Kannada traineddata using trestrain.sh Did you test to see whether there is difference/improvement in recognition compared to the kan traineddata provided by Google? The terminal extract also indicated a 'flat shape table' . - sent from my phone.

Re: [tesseract-ocr] v3.04 Release???

2015-11-20 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/Compiling ** 3.04 requires at least v1.71 of Leptonica.** ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Nov 20, 2015 at 6:49 PM, Supriya Das

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
There is marathi traineddata. However that is not trained with cube engine and hence may not be as accurate. http://packages.ubuntu.com/wily/tesseract-ocr-mar You can test with both hin and mar and report your experience. Thanks! - sent from my phone. excuse the brevity. On 28 Oct 2015 14:16,

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
For indian languages also check out OCR feature in google drive/docs. - sent from my phone. excuse the brevity. On 28 Oct 2015 17:34, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > There is marathi traineddata. However that is not trained with cube engine > and hence

Re: [tesseract-ocr] Is there any difference using Tesseract on a mac or pc ?

2015-10-14 Thread ShreeDevi Kumar
manpages.ubuntu.com/manpages/precise/man1/tesseraact.1.html - sent from my phone. excuse the brevity. On 14 Oct 2015 19:40, "Bill Wong" wrote: > I've been comparing for the same image on PC and MAC, the results differ a > lot. > My images are PNG files, in french

Re: [tesseract-ocr] Is there any difference using Tesseract on a mac or pc ?

2015-10-14 Thread ShreeDevi Kumar
To use a particular language the syntax is -l fra Not -fra - sent from my phone. excuse the brevity. On 14 Oct 2015 19:40, "Bill Wong" wrote: > I've been comparing for the same image on PC and MAC, the results differ a > lot. > My images are PNG files, in french

Re: [tesseract-ocr] Tesseract 3.04 error.

2015-09-16 Thread ShreeDevi Kumar
compiling it myself . > > On Wednesday, September 16, 2015 at 12:36:47 AM UTC-4, shree wrote: >> >> Does your input filename have a space in it? >> >> - sent from my phone. excuse the brevity and typos. >> On 16 Sep 2015 10:05, "ShreeDevi Kumar" <shree

Re: [tesseract-ocr] Tesseract 3.04 error.

2015-09-15 Thread ShreeDevi Kumar
Does your input filename have a space in it? - sent from my phone. excuse the brevity and typos. On 16 Sep 2015 10:05, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > Did u check if the output file is created? > > That is just a warning from leptonica. > &g

Re: [tesseract-ocr] Re: Traineddata inspector

2015-09-09 Thread ShreeDevi Kumar
Hello Jozef, Thank you for this tool. It is very helpful to have a visual look at inttemp. I tried it with hin.traineddata (devanagri script) as well as some custom trained data. The inttemp display does not seem to correspond to the titles for the boxes. When I checked for eng.traineddata they

Re: [tesseract-ocr] Re: persian in tesseract-ocr

2015-08-17 Thread ShreeDevi Kumar
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful. ​ Ray - https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ Another

Re: [tesseract-ocr] Re: persian in tesseract-ocr

2015-08-16 Thread ShreeDevi Kumar
Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful. As far as I know, Google Docs does not use tesseract OCR engine for recognizing the text. Its OCR accuracy is better than Tesseract for some Indian languages also. However, it doesn't

Re: [tesseract-ocr] Re: Memory leak

2015-08-14 Thread ShreeDevi Kumar
It maybe best to post this as an issue - sent from my phone. excuse the brevity and typos. On 13 Aug 2015 15:30, Anshul Maheshwari anshul.ffm...@gmail.com wrote: I have pasted valgrind output, where tesseract is just linked not used any single api of tessearct in my code. then it have

Re: [tesseract-ocr] building on cygwin with training data

2015-08-02 Thread ShreeDevi Kumar
On Sun, Aug 2, 2015 at 3:25 PM, Marco Atzeri marco.atz...@gmail.com wrote: On 8/2/2015 10:31 AM, ShreeDevi Kumar wrote: + tesseract-dev google group Thank you, Marco. I will download the training tools packages and and give it a try. In future updates to the tesseract package, may I

Re: [tesseract-ocr] FreeOCR stops working when trying to OCR in Greek (ell or grc) languages

2015-07-31 Thread ShreeDevi Kumar
I am assuming that FreeOCR is using an older version of tesseract engine and hence does not support the newer traineddata files for grc etc. On Windows, you can give a try to the binaries built by Simon on cygwin with the latest code from github - http://domasofan.spdns.eu/tesseract/ ShreeDevi

Re: [tesseract-ocr] tesseract on cygwin

2015-07-27 Thread ShreeDevi Kumar
- International Components for Unicode: Layout library icu-lx icu-lx - International Components for Unicode: Paragraph Layout library $ pkg-config --libs icu-i18n -licui18n -licuuc -licudata -lpthread -lm On 7/27/2015 9:05 AM, ShreeDevi Kumar wrote: Marco, Please see

Re: [tesseract-ocr] tesseract on cygwin

2015-07-27 Thread ShreeDevi Kumar
**: training tools don't build #61* - sent from my phone. excuse the brevity and typos. On 27 Jul 2015 11:50, Marco Atzeri marco.atz...@gmail.com wrote: On 7/27/2015 4:54 AM, ShreeDevi Kumar wrote: Thank you, Marco. 1. Is there a way to download just the tesseract package and dependencies (like

Re: [tesseract-ocr] tesseract on cygwin

2015-07-26 Thread ShreeDevi Kumar
Thank you, Marco. 1. Is there a way to download just the tesseract package and dependencies (like Simon had setup) for testing purposes for those who do not have a cygwin install? 2. The pdf output option (as far as I understand it) adds the OCRed text layer on top of copy of the original image,

Re: [tesseract-ocr] require tesseract.exe of 3.04 version.

2015-07-25 Thread ShreeDevi Kumar
You can test the Cygwin compiled windows binaries by Simon. However pdf output is not working in it. - sent from my phone. excuse the brevity and typos. On 25 Jul 2015 16:07, Sriranga(81+yrsold) withblessi...@gmail.com wrote: thanks for the information. On 21 July 2015 at 05:09, ShreeDevi

Re: [tesseract-ocr] tesseract on cygwin

2015-07-23 Thread ShreeDevi Kumar
to file. greetings, simon Am 23.07.2015 um 04:55 schrieb ShreeDevi Kumar: http://domasofan.spdns.eu/tesseract/how%20to%20install.txt Excellent instructions, Simon. I am downloading and will give it a try under Windows8. I would suggest that you add 'Tesseract for Windows' as a heading

Re: [tesseract-ocr] displayed version number of tesseract when compiled from git

2015-07-23 Thread ShreeDevi Kumar
Zdenko, Just to confirm, Is it OK to use the newer releases from https://github.com/tesseract-ocr/tesseract/releases for distribution or is the latest code for distribution 3.04.00? Thanks! ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] building tesseract on windows using cygwin

2015-07-21 Thread ShreeDevi Kumar
as any other. Why it should be tagged??? Zdenko On Tue, Jul 21, 2015 at 6:44 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Zdenko, How is this update tagged? Is there a version number with it for future ref. - sent from my phone. excuse the brevity. On 21 Jul 2015 00:09, zdenko

Re: [tesseract-ocr] require tesseract.exe of 3.04 version.

2015-07-21 Thread ShreeDevi Kumar
I don't think a windows binary of 3.04.00 has been made available. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 20, 2015 at 6:11 PM, sriranga(82yrsold) withblessing.sriranga.1...@gmail.com wrote: From

Re: [tesseract-ocr] Multiple tiff processing

2015-07-21 Thread ShreeDevi Kumar
for f in *.tif do tesseract$f $f hocr done ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 21, 2015 at 4:29 PM, Stathis L. doombringer...@gmail.com wrote: Does anybody know how to process multiple

Re: [tesseract-ocr] Multiple tiff processing

2015-07-21 Thread ShreeDevi Kumar
that for loop is for a bash script - please see http://www.cyberciti.biz/faq/bash-for-loop/ for examples - ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 21, 2015 at 6:36 PM, Stathis L.

Re: [tesseract-ocr] differences between version 3.03 and 3.04

2015-07-13 Thread ShreeDevi Kumar
Mark, 3.04 is officially going to be released soon. Can you share your experience with windows build to help in that process. - sent from my phone. excuse the brevity. On 11 Jul 2015 10:44, Mark Seidner topo...@gmail.com wrote: Hi everyone, I downloaded the latest 3.04 code from git and

Re: [tesseract-ocr] differences between version 3.03 and 3.04

2015-07-11 Thread ShreeDevi Kumar
Which traineddata files are you using? For English, as per https://github.com/tesseract-ocr/tessdata/commit/074c37215b01ab8cc47a0e06ff7356383883d775 the new traineddata files are NOT included in 3.04 as Updated 98 traineddata files with the 3.04 training. ara, eng, hin, kor not included as they

Re: [tesseract-ocr] What to Do With Multiple .traineddata Files of the Same Language?

2015-07-10 Thread ShreeDevi Kumar
Usually if you have multiple traineddata for same language, you would give a distinct name to each eg. eng and en2 Then if you want to use both -l eng+en2 Or -l en2+eng Depending on which one u want to give priority to To use ur own traineddata en2 only -l en2 - sent from my phone. excuse the

Re: [tesseract-ocr] What to Do With Multiple .traineddata Files of the Same Language?

2015-07-10 Thread ShreeDevi Kumar
See https://tesseract-ocr.googlecode.com/git/doc/tesseract.1.html for syntax of command - sent from my phone. excuse the brevity On 10 Jul 2015 12:11, ShreeDevi Kumar shreesh...@gmail.com wrote: Usually if you have multiple traineddata for same language, you would give a distinct name to each

Re: [tesseract-ocr] Re: what's the content of fixed-length-dawgs

2015-07-09 Thread ShreeDevi Kumar
Have you tried with the new traineddata files at https://github.com/tesseract-ocr/tessdata ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jul 9, 2015 at 2:55 PM, wfxia...@gmail.com wrote: Hi, Nade, thanks for

Re: [tesseract-ocr] Re: what's the content of fixed-length-dawgs

2015-07-09 Thread ShreeDevi Kumar
Also see the language training data available at https://github.com/tesseract-ocr/langdata ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jul 9, 2015 at 8:27 PM, ShreeDevi Kumar shreesh...@gmail.com wrote

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
Did you try with the Latin traineddata https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi भजन -

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
, ShreeDevi Kumar shreesh...@gmail.com wrote: Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
, 2015 at 6:41 PM, ShreeDevi Kumar shree...@gmail.com wrote: Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi

Re: [tesseract-ocr] Tesseract 3.02 does not detect inter-word spacing for Bengali language.

2015-05-19 Thread ShreeDevi Kumar
Please try the vietocr gui frontend for tesseract ocr available from http://vietocr.sourceforge.net/ It uses a newer version of tesseract. you can also try using the bengali traineddata available on tesseract site -

Re: [tesseract-ocr] Tessdata for marathi

2015-04-05 Thread ShreeDevi Kumar
I have not done any additional work on that. Not sure when the next release will be and which languages will be supported in it. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Apr 5, 2015 at 11:55 PM, Ash L

Re: [tesseract-ocr] Training for plotter file

2015-03-22 Thread ShreeDevi Kumar
vietocr has bulkocr and batch options. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Mar 22, 2015 at 6:39 AM, Dennis dennisg...@gmail.com wrote: I'm using the latest version of tesseract: 3.02. I

Re: [tesseract-ocr] Preparing training data for new language

2015-03-15 Thread ShreeDevi Kumar
Please see http://www.ucsc.cmb.ac.lk/sdu/research.html http://192.248.22.122/ocrsinhala/upload.php Here is the output from it: ටුද්‍රණි:ල .ය්චත වැට වරීජන:: ඵාෂ්. ඨ:ර්චූකට පවන්චි:යගැ න ::න චූට කූ- එ0 දූකූ:ගයගැ 0පි පිශ්‍රීබඳව රජය:ෘන් ඉදීරිෂන් කූයරන ය:ට,රණ් ච්ඝ දූ0කට 9දාද්‍රඩා භ:තපිජං .ාරීග ාඝන්

Re: [tesseract-ocr] German doucment

2015-03-09 Thread ShreeDevi Kumar
German language code is deu NOT dau ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 9, 2015 at 9:06 AM, Ofer Rosenberg rosenberg.o...@gmail.com wrote: Hello, I have a problem when running tesseract for a

Re: [tesseract-ocr] Re: Android OCR application looking for quality improvments

2015-03-09 Thread ShreeDevi Kumar
have you followed the suggestions given on https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 9, 2015 at 10:26 AM, Daniel danieluc...@gmail.com wrote:

Re: [tesseract-ocr] Re: Steps to Configure Tesseract OCR for tamil Language

2015-03-01 Thread ShreeDevi Kumar
http://sourceforge.net/projects/tesseracthindi/files/?source=navbar you can take the training files from there and improve. If the work is for an NGO, you can also contact IISC for Tamil and Kannada OCR - please see

Re: [tesseract-ocr] Unable to locate dictionary files

2015-02-02 Thread ShreeDevi Kumar
https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Feng https://code.google.com/p/tesseract-ocr/source/browse?repo=tessdata#git http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html pecify option -u to unpack all the components to the specified

Re: [tesseract-ocr] Help needed in understanding source. New to tesseract.

2015-02-01 Thread ShreeDevi Kumar
You can look at http://zdenop.github.io/tesseract-doc/ http://fossies.org/dox/tesseract-ocr-3.02.02/index.html https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing https://code.google.com/p/tesseract-ocr/wiki/Documentation ShreeDevi

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-20 Thread ShreeDevi Kumar
Have you looked at imagemagick and related scripts for pre-processing the images? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jan 21, 2015 at 1:30 AM, newbie spens.mallang...@gmail.com wrote: I found that

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
://bhajans.ramparivar.com On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shree...@gmail.com wrote: you should *uninstall the old version fully* and then build the version from git. It is possibly referring to some older libraries. Also, this needs leptonica 1.71. Not sure if the documentation

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
As far as I know, pdf creation is a new addition and the issues were ironed out only recently. There have been over 100 commits to the code since 3.03 rc. If you want the new functionality, you can try compiling the code from https://code.google.com/p/tesseract-ocr/source/checkout Instructions

<    2   3   4   5   6   7   8   >