Marin Pierre, Guidance how to use OCRB.Disambiguation.txt effectively? sample is requested.
-sriranga(77yrsold) On Wed, Apr 21, 2010 at 3:32 PM, Sriranga(77yrsold) <[email protected] > wrote: > Dear Pirrre, > I tested using OCRB.tif and eurotext.tif and its output are attached > herewith. I used commandline for both tif > using tesseract 3.0 version. > It is observed that for output texts using *cst*(generated by you) and * > eng* datafiles for *OCRB.tif *are identical and found to be in order > whereas output texts using *cst* and *eng* datafiles for *eurotext.tif* > are not identical and found that text generated using *cst* has many > mispelling when compared to output txt using *eng*.datafiles viz > traineddata.. > > The above observations are brought to your kind investigation and valuable > guidance regarding how to improve accuracy in the output. > With regards, > -sriranga(77yrsold) > > > On Tue, Apr 13, 2010 at 7:52 AM, MARTIN Pierre <[email protected]>wrote: > >> Dear Sriranga, >> >> Please confirm whether you have succeeded in training by using your >> commandline like >> "tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.logfile" >> [please note Logfile is used for Windows platform like winXP] >> Kindly upload OCRB.tif for hands on experience by me. >> >> Sure, but the files are too big. i'm going to create a compressed file, so >> you can see. Also i'll include the batch files i've made (For windows, but >> the commands are pretty much the same for nux/nix. >> >> In this attachement: >> - 7 batch files, named in order of run. >> - Two pictures, first one (OCRBFull.tif) is from a photoshop document i've >> manufactured with the OCRB font, the other one (OCRBReal.tif) is a patchwork >> of real scanned data. >> - You can delete everything inside the "Generated" folder, it's re-created >> by the scripts, but i've included the files in the archive so you can see >> what's created. >> - A number of text files, which are the actually needed files for the new >> traineddata format. >> >> Also, please note you'll need the "combine" binary. If you need any kind >> of help regarding it's compilation, i've created a visual studio project for >> it. >> >> Also, i've totally cleaned up the svn visual studio project. Now >> everything is generated in only two folders (Debug, Release). Debugging >> information is made in such case, and symbols are read properlly when >> debugging. Let me know if you or anyone else needs this too. >> >> I wanted to use your commandline for Indic lang like Kannada. >> >> Let me know if it worked for you then. >> >> Thanks for your research, Pierre. >> >> You're very welcome. >> >> Best, >> Pierre. >> >> >> -- >> >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<tesseract-ocr%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

