Dmitry, I am extremely thankful for your valuable guidance. It works for me.I have to lean many things under you. With warmest Regards, -sriranga(78yrs)
On Thu, Feb 17, 2011 at 1:56 PM, Dmitry Silaev <[email protected]>wrote: > Sriranga, > > > It is > > presumed that commandline for (WinXP) should be as follows: > > eg= " c:\tess\copy 001.tr + 002.tr + 003.tr + oo4.tr > 1234.tr or > > Multiimage.tr" which may kindly be confirmed. OR correct commandline > for > > cancatenate using command "copy" to be used may kindly be intimated. > > This command won't do what you want. First, you don't need to indicate > a path before "copy" as it is a built-in command of the MS-DOS command > processor, while prepended with a path it is treated as a name of an > executable within the "c:\tess\" directory and it doesn't exist. > Second, you don't need the ">" as it will direct all informational > output of the "copy" command (not files' contents) to "1234.tr". A > destination file should be specified at the end of the command after a > space. Therefore your command line should be > > copy 001.tr + 002.tr + 003.tr + oo4.tr 1234.tr > > Warm regards, > Dmitry Silaev > > > > > > On Thu, Feb 17, 2011 at 9:44 AM, Sriranga(78yrsold) > <[email protected]> wrote: > > Dmitry, > > Thanks for the valuable guidance However I could not understand how to > > cancatenate (simply "copy" all the resulted .tr files together? It is > > presumed that commandline for (WinXP) should be as follows: > > eg= " c:\tess\copy 001.tr + 002.tr + 003.tr + oo4.tr > 1234.tr or > > Multiimage.tr" which may kindly be confirmed. OR correct commandline > for > > cancatenate using command "copy" to be used may kindly be intimated. > > With Warmest Regards, > > -sriranga(78yrs) > > > > On Wed, Feb 16, 2011 at 11:58 AM, Dmitry Silaev <[email protected]> > > wrote: > >> > >> Guys, > >> > >> If you have more than one box/tiff pair, you can train (i.e. generate a > >> .tr file) for each of these pairs separately. > >> > >> Then you can concatenate (simply "cat" or "copy") all resulted .tr files > >> together and then run all training tools on the single final .tr file. > This > >> relieves you from the 32 file limit. > >> > >> For your convenience you can craft a batch file or shell script which > >> would train, concatenate, cluster, etc. in one run. You should analyze > all > >> errors carefully though. > >> > >> Warm regards, > >> Dmitry Silaev > >> > >> > >> > >> > >> On Wed, Feb 16, 2011 at 5:56 AM, Sriranga(78yrsold) > >> <[email protected]> wrote: > >>> > >>> Dimitry, > >>> It appears that Khem has not endorsed copy to you as such I am > forwarding > >>> for valuable guidance/comments - which may help me in my Kannada > project.. > >>> with regards, > >>> -sriranga(78yrs) > >>> > >>> ---------- Forwarded message ---------- > >>> From: KHEM Sochenda <[email protected]> > >>> Date: Wed, Feb 16, 2011 at 7:45 AM > >>> Subject: Re: Tesseract Training > >>> To: "Sriranga(78yrsold)" <[email protected]> > >>> > >>> > >>> Dear Sriranga, > >>> > >>> The below are the steps that I did the trainings: > >>> > >>> I created 3 pages of training images as you can see in the attachments( > >>> khm.limons1.1 is page, khm.limons1.2 is page 2, and the khm.limons1.3 > is the > >>> page 3) > >>> I create box files of every page (khm.limons1.1.box and so on) with the > >>> command line: > >>> > >>> "tesseract khm.limons1.1.tif khm.limons1.1 batch.nochop makebox" for > >>> page 1 and "tesseract khm.limons1.2.tif khm.limons1.2 batch.nochop > makebox" > >>> for page two and the same for the page 3. > >>> > >>> Then I edit the box files, I got the final result in the attachments. > >>> I merged the images together into a single file (khm.limons1.0.tif) > >>> I merged to three box files into a single box file with page number > >>> assigned (khm.limons1.0.box) > >>> > >>> I ran the command to train the sinble file "tesseract khm.limons1.1.tif > >>> khm.limons1.0.tif khm.limons1.0 nobatch box.train".. Result look okay > at > >>> this step. (My purpose to merge this into one file is I want a single > font > >>> to be in just one .tr file) > >>> > >>> I then run the command "unicharset_extractor khm.limons1.0.box " to > >>> extract every single glyp from the box files. The result look okay. > >>> > >>> Then I tried running this to extract the feature "mftraining –U > >>> unicharset –O khm.unicharset khm.limons1.0.tr" and "cntraining > >>> khm.limons1.0.tr" I failed in this step. > >>> > >>> > >>> > -------------------------------------------------------------------------------------------------------- > >>> Since I have no clue getting the above idea works, I obmitted the step > 4 > >>> and 5 and skipped to point 6, 7, and 8 using the separated box files, I > got > >>> the traineddata as in the attached file. With three .tr files > separately is > >>> not what I want to do. > >>> > >>> Currently I used the obtained trained data for my temporary OCR system. > >>> What I wished to do is to add other fonts, but the number of .tr files > are > >>> limited to 32 only... This is what I concerned. > >>> > >>> Best Regards, > >>> > >>> Sochenda > >>> > >>> > >>> > >>> > >>> > >> > > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

