On Tue, Sep 3, 2013 at 8:54 AM, Shree Devi Kumar <shreesh...@gmail.com>wrote:
> I updated tesseract to the latest version in svn and now I am getting > errors while running training .. > > > D:\BuildFolder\testing\TRAINdata\v6-TransliterationOnly>echo off > tesseract 3.02.03 > leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit] > libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5 > > **** extracting unicharset ***** > Extracting unicharset from ipa.sanskrit2003.exp994.box > Wrote unicharset file ./unicharset. > **** done extracting unicharset from ***** > **** ipa.sanskrit2003.exp994.box **** > **** Training using following .tr files ***** > **** ipa.sanskrit2003.exp994.tr **** > **** NO Shapeclustering - Non Indic Language***** > **** Started MFTraining ***** > Read shape table shapetable of 733 shapes > Reading ipa.sanskrit2003.exp994.tr ... > > id < this->size():Error:Assert failed:in file ..\..\ccutil\unicharset.cpp, > line > 237 > > What was your last working tesseract version? Did you used svn version in past? > Has anyone else had this problem? > > > Additionally, for sanskrit language data > I am errors while running OCR on .png images - it worked fine earlier. > > 1 file(s) copied. > tesseract 3.02.03 > leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit] > libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5 > > processing san.0s2003.exp0.tif > processing san.0s2003.exp8.tif > processing san.0sanskrit2003.exp0.tif > processing san.0sanskrit2003.exp8.tif > processing san.mnt.exp013.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > processing san.mnt.exp014.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > processing san.mnt.exp031.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > processing san.mnt.exp032.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > processing san.mnt.exp038.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > processing san.mnt.exp424.png > TIFFstream: Not a TIFF file, bad magic number 20617 (0x5089). > Press any key to continue . . . > > > Should I open issues for the above? > > TIFFstream: Not a TIFF file should be error from leptonica. So please test it with some leptonica program. If there is still problem, create issue at leptonica project. Strange is that comment show that you are processing png, but error is regarding tiff... Check if everything is ok with filename.... > > > > > > Shree Devi Kumar > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > > On Thu, Aug 22, 2013 at 3:10 PM, Shree <shreesh...@gmail.com> wrote: > >> I had started training Tessearct for recognizing texts which have Indic >> transliteration - please see >> http://www.unicode.org/cldr/charts/transforms/Latin-Indic.html for the >> diacritics used for the same. >> >> After Ray's post regarding upcoming merge and next release, I am holding >> off on further training. >> >> However, I wanted to check whether this is already available as part of >> another language data. I am attaching a sample image, text file as well as >> the unicharset for reference. >> >> Thanks, >> Shree >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-dev" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-dev/bRD21wf3GxQ/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> tesseract-dev+unsubscr...@googlegroups.com. >> >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.