Sriranga, > It is > presumed that commandline for (WinXP) should be as follows: > eg= " c:\tess\copy 001.tr + 002.tr + 003.tr + oo4.tr > 1234.tr or > Multiimage.tr" which may kindly be confirmed. OR correct commandline for > cancatenate using command "copy" to be used may kindly be intimated.
This command won't do what you want. First, you don't need to indicate a path before "copy" as it is a built-in command of the MS-DOS command processor, while prepended with a path it is treated as a name of an executable within the "c:\tess\" directory and it doesn't exist. Second, you don't need the ">" as it will direct all informational output of the "copy" command (not files' contents) to "1234.tr". A destination file should be specified at the end of the command after a space. Therefore your command line should be copy 001.tr + 002.tr + 003.tr + oo4.tr 1234.tr Warm regards, Dmitry Silaev On Thu, Feb 17, 2011 at 9:44 AM, Sriranga(78yrsold) <[email protected]> wrote: > Dmitry, > Thanks for the valuable guidance However I could not understand how to > cancatenate (simply "copy" all the resulted .tr files together? It is > presumed that commandline for (WinXP) should be as follows: > eg= " c:\tess\copy 001.tr + 002.tr + 003.tr + oo4.tr > 1234.tr or > Multiimage.tr" which may kindly be confirmed. OR correct commandline for > cancatenate using command "copy" to be used may kindly be intimated. > With Warmest Regards, > -sriranga(78yrs) > > On Wed, Feb 16, 2011 at 11:58 AM, Dmitry Silaev <[email protected]> > wrote: >> >> Guys, >> >> If you have more than one box/tiff pair, you can train (i.e. generate a >> .tr file) for each of these pairs separately. >> >> Then you can concatenate (simply "cat" or "copy") all resulted .tr files >> together and then run all training tools on the single final .tr file. This >> relieves you from the 32 file limit. >> >> For your convenience you can craft a batch file or shell script which >> would train, concatenate, cluster, etc. in one run. You should analyze all >> errors carefully though. >> >> Warm regards, >> Dmitry Silaev >> >> >> >> >> On Wed, Feb 16, 2011 at 5:56 AM, Sriranga(78yrsold) >> <[email protected]> wrote: >>> >>> Dimitry, >>> It appears that Khem has not endorsed copy to you as such I am forwarding >>> for valuable guidance/comments - which may help me in my Kannada project.. >>> with regards, >>> -sriranga(78yrs) >>> >>> ---------- Forwarded message ---------- >>> From: KHEM Sochenda <[email protected]> >>> Date: Wed, Feb 16, 2011 at 7:45 AM >>> Subject: Re: Tesseract Training >>> To: "Sriranga(78yrsold)" <[email protected]> >>> >>> >>> Dear Sriranga, >>> >>> The below are the steps that I did the trainings: >>> >>> I created 3 pages of training images as you can see in the attachments( >>> khm.limons1.1 is page, khm.limons1.2 is page 2, and the khm.limons1.3 is the >>> page 3) >>> I create box files of every page (khm.limons1.1.box and so on) with the >>> command line: >>> >>> "tesseract khm.limons1.1.tif khm.limons1.1 batch.nochop makebox" for >>> page 1 and "tesseract khm.limons1.2.tif khm.limons1.2 batch.nochop makebox" >>> for page two and the same for the page 3. >>> >>> Then I edit the box files, I got the final result in the attachments. >>> I merged the images together into a single file (khm.limons1.0.tif) >>> I merged to three box files into a single box file with page number >>> assigned (khm.limons1.0.box) >>> >>> I ran the command to train the sinble file "tesseract khm.limons1.1.tif >>> khm.limons1.0.tif khm.limons1.0 nobatch box.train".. Result look okay at >>> this step. (My purpose to merge this into one file is I want a single font >>> to be in just one .tr file) >>> >>> I then run the command "unicharset_extractor khm.limons1.0.box " to >>> extract every single glyp from the box files. The result look okay. >>> >>> Then I tried running this to extract the feature "mftraining –U >>> unicharset –O khm.unicharset khm.limons1.0.tr" and "cntraining >>> khm.limons1.0.tr" I failed in this step. >>> >>> >>> -------------------------------------------------------------------------------------------------------- >>> Since I have no clue getting the above idea works, I obmitted the step 4 >>> and 5 and skipped to point 6, 7, and 8 using the separated box files, I got >>> the traineddata as in the attached file. With three .tr files separately is >>> not what I want to do. >>> >>> Currently I used the obtained trained data for my temporary OCR system. >>> What I wished to do is to add other fonts, but the number of .tr files are >>> limited to 32 only... This is what I concerned. >>> >>> Best Regards, >>> >>> Sochenda >>> >>> >>> >>> >>> >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

