Hi,

Where can I download dawg2wordlist.exe ?
does it work for all dawg files ? eg punc , wordlist, frequent_words etc

kind regards

Richard

On Thursday, March 8, 2012 6:48:49 PM UTC+11, sriranga(79yrsold) wrote:
>
> I forgot to inform that I have already generated lib_debug 
> version"dawg2wordlistd.exe" - when run in CMD same message displayed for 
> lib_release version exe file.
>
> On Thu, Mar 8, 2012 at 12:50 PM, Sriranga(78yrsold) <
> [email protected]> wrote:
>
>> Reg:kan.traineddata: revised screenshot attached for perusal.
>> Successfully generated output for the kan.punc-dawg and kan.numeric-dawg. 
>> I could not understand why should failed for kan.word-dawg and 
>> kan.freq.dawg only?.  
>>
>> Reg tel.traineddata:   -  tested. Same problem similar/identical  to kan 
>> faced by me. 
>>
>> Since I am not programmer nor developer-  difficult to understand and 
>> follow. 
>>
>> Re: tessdata: - no problem with tessdata folder - since it  is located 
>> above all exe files. 
>> able to generate traineddata files.
>> With regards,
>> -sriranga(79yrs)
>>
>>
>>
>>
>>
>> On Thu, Mar 8, 2012 at 11:35 AM, TP <[email protected]> wrote:
>>
>>> On Wed, Mar 7, 2012 at 7:55 PM, Sriranga(78yrs)
>>> <[email protected]> wrote:
>>> > David,
>>> > Thank you for the valuable guidance. I followed your steps still 
>>> problem of
>>> > window's exe encounter -  vide screenshot is attached. WinXP(sp3)  
>>> tesseract
>>> > -r-700
>>> > With warmest regards,
>>> > -sriranga(79yrs)
>>> >
>>> >
>>> > On Thu, Mar 8, 2012 at 12:42 AM, David Eger <[email protected]> 
>>> wrote:
>>> >>
>>> >> $ combine_tessdata -u ./third_party/tesseract/tessdata/
>>> >> kan.traineddata ./kan.
>>> >> Extracting tessdata components from ./third_party/tesseract/tessdata/
>>> >> kan.traineddata
>>> >> Wrote ./kan.unicharset
>>> >> Wrote ./kan.inttemp
>>> >> Wrote ./kan.pffmtable
>>> >> Wrote ./kan.normproto
>>> >> Wrote ./kan.punc-dawg
>>> >> Wrote ./kan.word-dawg
>>> >> Wrote ./kan.number-dawg
>>> >> Wrote ./kan.freq-dawg
>>> >>
>>> >> $ ls kan.*
>>> >> kan.freq-dawg  kan.inttemp  kan.normproto  kan.number-dawg
>>> >>  kan.pffmtable  kan.punc-dawg  kan.unicharset  kan.word-dawg
>>> >>
>>> >> $ dawg2wordlist kan.unicharset kan.word-dawg word.wordlist
>>> >> Loading word list from kan.word-dawg
>>> >> Reading squished dawg
>>> >> Word list loaded.
>>> >>
>>> >> $ wc -l word.wordlist
>>> >> 18720 word.wordlist
>>> >>
>>> >> Looks like there are 18,720 words in the Kannada word dawg, safely
>>> >> uncompressed...
>>> >>
>>> >>
>>> >>
>>> >> On Mar 7, 8:43 am, "Sriranga(78yrs)" <withblessing.sriranga.
>>> >> [email protected]> wrote:
>>> >> > David,
>>> >> > just now I checked with kan.punc-dawg(1KB) and kan.number-dawg(1KB)
>>> >> > also.
>>> >> > it works fine In both cases the output were not empty. Only
>>> >> > word-dawg(181KB) and freq-dawg(2KB) does not work but with M$ 
>>> windows's
>>> >> > exe
>>> >> > encounter message were displayed.
>>> >> > this is brought to your kind notice. Even attached files of
>>> >> > kan.word-dawg
>>> >> > and kan.freq.dawg - for your investigation and valuable guidance.
>>> >> > With warmest regards,
>>> >> > -sriranga(79yrs)
>>> >> >
>>> >> > On Wed, Mar 7, 2012 at 9:44 AM, Sriranga(78yrs) <
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > [email protected]> wrote:
>>> >> > > David,
>>> >> > > Thanks for the valuable guidance.
>>> >> > > Copied dawg2wordlist.exe pasted in the folder n:\Newfolder\ 
>>> wherein
>>> >> > > extracted files  Kan.unicharset, kan.word-dawg, kan.freq-dawg are
>>> >> > > located.
>>> >> >
>>> >> > > extract of cmd is reproduced below - with encounter.exe windows
>>> >> > > messages
>>> >> > > displayed for word-dawg and freq-dawg.
>>> >> > > M:\New Folder>dawg2wordlist.exe -h
>>> >> > > Print all the words in a given dawg.
>>> >> > > Usage: dawg2wordlist.exe <unicharset> <dawgfile> <wordlistfile>
>>> >> >
>>> >> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.word-dawg
>>> >> > > testwordlist
>>> >> > > Loading word list from kan.word-dawg
>>> >> > > Reading squished dawg
>>> >> >
>>> >> > > M:\New Folder>dawg2wordlist.exe kan.unicharset kan.freq-dawg
>>> >> > > testwordlist
>>> >> > > Loading word list from kan.freq-dawg
>>> >> > > Reading squished dawg
>>> >> > > Word list loaded.
>>> >> > > M:\New Folder>
>>> >> >
>>> >> > >    [Note: testwordlist contains 0(zero)kb for kan.freq-dawg which
>>> >> > > contains
>>> >> > > 2KB -
>>> >> > >      whereas testwordlist did not generate for kan.word-dawg which
>>> >> > > contains 181KB]
>>> >> > > Awaiting further valuable guidance.
>>> >> > > With regards,
>>> >> > > -sriranga(79yrs)
>>> >> >
>>> >> > > Still i could not understand where I made mistake?
>>> >> > > With regards,
>>> >> > > -sriranga(79yrs)
>>> >> >
>>> >> > > On Wed, Mar 7, 2012 at 2:41 AM, David Eger <[email protected]>
>>> >> > > wrote:
>>> >> >
>>> >> > >> Where you put wordlist2dawg.exe, try putting the name of the 
>>> output
>>> >> > >> list
>>> >> > >> instead.
>>> >> >
>>> >> > >> On Friday, March 2, 2012 2:39:33 AM UTC-8, sriranga(79yrsold) 
>>> wrote:
>>> >> >
>>> >> > >>> I had extracted kan.word-dawg from the Kan.traineddata. I am 
>>> trying
>>> >> > >>> to
>>> >> > >>> convert dawg to wordlist using commandline in cmd as follows:
>>> >> >
>>> >> > >>> ***M:\r684\BuildFolder\tesseract-ocr>dawg2wordlist "m:\New
>>> >> > >>> Folder\kan.unicharset" "
>>> >> > >>> m:\New Folder\kan.word-dawg" wordlist2dawg.exe
>>> >> > >>> Loading word list from m:\New Folder\kan.word-dawg
>>> >> > >>> Reading squished dawg
>>> >> >
>>> >> > >>> M:\r684\BuildFolder\tesseract-ocr>
>>> >> > >>> *
>>> >> > >>> Unfortunately windows encounter exe displayed. Where I made a
>>> >> > >>> mistake?
>>> >> > >>> Awaiting solution?
>>> >> >
>>> >> >
>>> >> >  kan.word-dawg
>>> >> > 243KViewDownload
>>> >> >
>>> >> >  kan.freq-dawg
>>> >> > 2KViewDownload
>>> >> >
>>> >> >  kan.punc-dawg
>>> >> > < 1KViewDownload
>>> >> >
>>> >> >  kan.number-dawg
>>> >> > < 1KViewDownload
>>>
>>> Just looking at that screenshot you supplied, it starts with a ERROR
>>> message about TESSDATA_PREFIX not correctly pointing to the parent
>>> folder of TESSDATA folder?
>>>
>>> Have you fixed this by setting TESSDATA_PREFIX? This is prominently
>>> mentioned in the README [1] It should now probably point at your SVN
>>> working directory (and make sure it ends with a / character).
>>>
>>> And sorry to say, if you keep running into problems like this, you
>>> might want to think about learning to use the Visual Studio 2008
>>> Debugger :) It's pretty easy, and very handy for figuring out exactly
>>> where a program crashes.
>>>
>>> 1) You already know how to build tesseract with VS, so just set your
>>>   build configuration to LIB_Debug (when debugging the training apps).
>>>
>>> 2) Make the training app project (in this case dawg2wordlist) you are
>>>   trying to debug, the Default Startup project (by right clicking it
>>>   and choosing Set as Startup Project).
>>>
>>> 3) Open up the training app project's properties (by again
>>>   right-clicking it and choosing Properties).
>>>
>>> 4) Make sure at the top Configuration: is LIB_Debug.
>>>
>>> 5) In the Configuration Properties | Debugging Category, set the
>>>   following fields:
>>>
>>>   Command Arguments: (whatever you specified on the command line) so set 
>>> it to:
>>>
>>>      kan.unicharset kan.word-dawg word.wordlist
>>>
>>>   Working Directory should be your working directory so:
>>>
>>>      M:\New Folder\New Folder
>>>
>>>   (a terrible name for folders BTW :P )
>>>
>>> 6) Now for the exciting part, right-click the dawg2wordlist project and
>>>   choose Debug -> Start new instance from the popup menu.
>>>
>>> A new command window will show up (possible hidden by Visual Studio),
>>> displaying all of dawg2wordlist's output.
>>>
>>> When the program crashes, you should see a window in the debugger that
>>> shows exactly where the program was when it crashed and what the error
>>> reason is. From that either you (hopefully) or we can better figure
>>> out what is going wrong.
>>>
>>> [1] http://code.google.com/p/tesseract-ocr/wiki/ReadMe
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to