Hi, Where can I download dawg2wordlist.exe ? does it work for all dawg files ? eg punc , wordlist, frequent_words etc
kind regards Richard On Thursday, March 8, 2012 6:48:49 PM UTC+11, sriranga(79yrsold) wrote: > > I forgot to inform that I have already generated lib_debug > version"dawg2wordlistd.exe" - when run in CMD same message displayed for > lib_release version exe file. > > On Thu, Mar 8, 2012 at 12:50 PM, Sriranga(78yrsold) < > [email protected]> wrote: > >> Reg:kan.traineddata: revised screenshot attached for perusal. >> Successfully generated output for the kan.punc-dawg and kan.numeric-dawg. >> I could not understand why should failed for kan.word-dawg and >> kan.freq.dawg only?. >> >> Reg tel.traineddata: - tested. Same problem similar/identical to kan >> faced by me. >> >> Since I am not programmer nor developer- difficult to understand and >> follow. >> >> Re: tessdata: - no problem with tessdata folder - since it is located >> above all exe files. >> able to generate traineddata files. >> With regards, >> -sriranga(79yrs) >> >> >> >> >> >> On Thu, Mar 8, 2012 at 11:35 AM, TP <[email protected]> wrote: >> >>> On Wed, Mar 7, 2012 at 7:55 PM, Sriranga(78yrs) >>> <[email protected]> wrote: >>> > David, >>> > Thank you for the valuable guidance. I followed your steps still >>> problem of >>> > window's exe encounter - vide screenshot is attached. WinXP(sp3) >>> tesseract >>> > -r-700 >>> > With warmest regards, >>> > -sriranga(79yrs) >>> > >>> > >>> > On Thu, Mar 8, 2012 at 12:42 AM, David Eger <[email protected]> >>> wrote: >>> >> >>> >> $ combine_tessdata -u ./third_party/tesseract/tessdata/ >>> >> kan.traineddata ./kan. >>> >> Extracting tessdata components from ./third_party/tesseract/tessdata/ >>> >> kan.traineddata >>> >> Wrote ./kan.unicharset >>> >> Wrote ./kan.inttemp >>> >> Wrote ./kan.pffmtable >>> >> Wrote ./kan.normproto >>> >> Wrote ./kan.punc-dawg >>> >> Wrote ./kan.word-dawg >>> >> Wrote ./kan.number-dawg >>> >> Wrote ./kan.freq-dawg >>> >> >>> >> $ ls kan.* >>> >> kan.freq-dawg kan.inttemp kan.normproto kan.number-dawg >>> >> kan.pffmtable kan.punc-dawg kan.unicharset kan.word-dawg >>> >> >>> >> $ dawg2wordlist kan.unicharset kan.word-dawg word.wordlist >>> >> Loading word list from kan.word-dawg >>> >> Reading squished dawg >>> >> Word list loaded. >>> >> >>> >> $ wc -l word.wordlist >>> >> 18720 word.wordlist >>> >> >>> >> Looks like there are 18,720 words in the Kannada word dawg, safely >>> >> uncompressed... >>> >> >>> >> >>> >> >>> >> On Mar 7, 8:43 am, "Sriranga(78yrs)" <withblessing.sriranga. >>> >> [email protected]> wrote: >>> >> > David, >>> >> > just now I checked with kan.punc-dawg(1KB) and kan.number-dawg(1KB) >>> >> > also. >>> >> > it works fine In both cases the output were not empty. Only >>> >> > word-dawg(181KB) and freq-dawg(2KB) does not work but with M$ >>> windows's >>> >> > exe >>> >> > encounter message were displayed. >>> >> > this is brought to your kind notice. Even attached files of >>> >> > kan.word-dawg >>> >> > and kan.freq.dawg - for your investigation and valuable guidance. >>> >> > With warmest regards, >>> >> > -sriranga(79yrs) >>> >> > >>> >> > On Wed, Mar 7, 2012 at 9:44 AM, Sriranga(78yrs) < >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > [email protected]> wrote: >>> >> > > David, >>> >> > > Thanks for the valuable guidance. >>> >> > > Copied dawg2wordlist.exe pasted in the folder n:\Newfolder\ >>> wherein >>> >> > > extracted files Kan.unicharset, kan.word-dawg, kan.freq-dawg are >>> >> > > located. >>> >> > >>> >> > > extract of cmd is reproduced below - with encounter.exe windows >>> >> > > messages >>> >> > > displayed for word-dawg and freq-dawg. >>> >> > > M:\New Folder>dawg2wordlist.exe -h >>> >> > > Print all the words in a given dawg. >>> >> > > Usage: dawg2wordlist.exe <unicharset> <dawgfile> <wordlistfile> >>> >> > >>> >> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.word-dawg >>> >> > > testwordlist >>> >> > > Loading word list from kan.word-dawg >>> >> > > Reading squished dawg >>> >> > >>> >> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.freq-dawg >>> >> > > testwordlist >>> >> > > Loading word list from kan.freq-dawg >>> >> > > Reading squished dawg >>> >> > > Word list loaded. >>> >> > > M:\New Folder> >>> >> > >>> >> > > [Note: testwordlist contains 0(zero)kb for kan.freq-dawg which >>> >> > > contains >>> >> > > 2KB - >>> >> > > whereas testwordlist did not generate for kan.word-dawg which >>> >> > > contains 181KB] >>> >> > > Awaiting further valuable guidance. >>> >> > > With regards, >>> >> > > -sriranga(79yrs) >>> >> > >>> >> > > Still i could not understand where I made mistake? >>> >> > > With regards, >>> >> > > -sriranga(79yrs) >>> >> > >>> >> > > On Wed, Mar 7, 2012 at 2:41 AM, David Eger <[email protected]> >>> >> > > wrote: >>> >> > >>> >> > >> Where you put wordlist2dawg.exe, try putting the name of the >>> output >>> >> > >> list >>> >> > >> instead. >>> >> > >>> >> > >> On Friday, March 2, 2012 2:39:33 AM UTC-8, sriranga(79yrsold) >>> wrote: >>> >> > >>> >> > >>> I had extracted kan.word-dawg from the Kan.traineddata. I am >>> trying >>> >> > >>> to >>> >> > >>> convert dawg to wordlist using commandline in cmd as follows: >>> >> > >>> >> > >>> ***M:\r684\BuildFolder\tesseract-ocr>dawg2wordlist "m:\New >>> >> > >>> Folder\kan.unicharset" " >>> >> > >>> m:\New Folder\kan.word-dawg" wordlist2dawg.exe >>> >> > >>> Loading word list from m:\New Folder\kan.word-dawg >>> >> > >>> Reading squished dawg >>> >> > >>> >> > >>> M:\r684\BuildFolder\tesseract-ocr> >>> >> > >>> * >>> >> > >>> Unfortunately windows encounter exe displayed. Where I made a >>> >> > >>> mistake? >>> >> > >>> Awaiting solution? >>> >> > >>> >> > >>> >> > kan.word-dawg >>> >> > 243KViewDownload >>> >> > >>> >> > kan.freq-dawg >>> >> > 2KViewDownload >>> >> > >>> >> > kan.punc-dawg >>> >> > < 1KViewDownload >>> >> > >>> >> > kan.number-dawg >>> >> > < 1KViewDownload >>> >>> Just looking at that screenshot you supplied, it starts with a ERROR >>> message about TESSDATA_PREFIX not correctly pointing to the parent >>> folder of TESSDATA folder? >>> >>> Have you fixed this by setting TESSDATA_PREFIX? This is prominently >>> mentioned in the README [1] It should now probably point at your SVN >>> working directory (and make sure it ends with a / character). >>> >>> And sorry to say, if you keep running into problems like this, you >>> might want to think about learning to use the Visual Studio 2008 >>> Debugger :) It's pretty easy, and very handy for figuring out exactly >>> where a program crashes. >>> >>> 1) You already know how to build tesseract with VS, so just set your >>> build configuration to LIB_Debug (when debugging the training apps). >>> >>> 2) Make the training app project (in this case dawg2wordlist) you are >>> trying to debug, the Default Startup project (by right clicking it >>> and choosing Set as Startup Project). >>> >>> 3) Open up the training app project's properties (by again >>> right-clicking it and choosing Properties). >>> >>> 4) Make sure at the top Configuration: is LIB_Debug. >>> >>> 5) In the Configuration Properties | Debugging Category, set the >>> following fields: >>> >>> Command Arguments: (whatever you specified on the command line) so set >>> it to: >>> >>> kan.unicharset kan.word-dawg word.wordlist >>> >>> Working Directory should be your working directory so: >>> >>> M:\New Folder\New Folder >>> >>> (a terrible name for folders BTW :P ) >>> >>> 6) Now for the exciting part, right-click the dawg2wordlist project and >>> choose Debug -> Start new instance from the popup menu. >>> >>> A new command window will show up (possible hidden by Visual Studio), >>> displaying all of dawg2wordlist's output. >>> >>> When the program crashes, you should see a window in the debugger that >>> shows exactly where the program was when it crashed and what the error >>> reason is. From that either you (hopefully) or we can better figure >>> out what is going wrong. >>> >>> [1] http://code.google.com/p/tesseract-ocr/wiki/ReadMe >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

