Re: Generating / Training box files.

Sriranga(77yrsold) Wed, 21 Apr 2010 07:05:50 -0700

Marin Pierre,

Guidance how to use OCRB.Disambiguation.txt effectively? sample is
requested.


-sriranga(77yrsold)


On Wed, Apr 21, 2010 at 3:32 PM, Sriranga(77yrsold) <[email protected]
> wrote:

> Dear Pirrre,
> I tested using OCRB.tif and eurotext.tif  and its output are attached
> herewith. I used commandline for both tif
> using tesseract 3.0 version.
> It is observed  that for output texts using *cst*(generated by you) and *
> eng* datafiles for *OCRB.tif *are identical and found to be in order
> whereas output texts using *cst* and *eng* datafiles for *eurotext.tif*
> are not identical and found that text generated using  *cst* has many
> mispelling when compared to output txt using *eng*.datafiles viz
> traineddata..
>
> The above observations are brought to your kind investigation and valuable
> guidance regarding how to improve accuracy in the output.
> With regards,
> -sriranga(77yrsold)
>
>
> On Tue, Apr 13, 2010 at 7:52 AM, MARTIN Pierre <[email protected]>wrote:
>
>> Dear Sriranga,
>>
>> Please confirm whether you  have succeeded in training by using your
>> commandline like
>> "tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.logfile"
>> [please note Logfile is used for Windows platform like winXP]
>> Kindly upload OCRB.tif for hands on experience by me.
>>
>> Sure, but the files are too big. i'm going to create a compressed file, so
>> you can see. Also i'll include the batch files i've made (For windows, but
>> the commands are pretty much the same for nux/nix.
>>
>> In this attachement:
>> - 7 batch files, named in order of run.
>> - Two pictures, first one (OCRBFull.tif) is from a photoshop document i've
>> manufactured with the OCRB font, the other one (OCRBReal.tif) is a patchwork
>> of real scanned data.
>> - You can delete everything inside the "Generated" folder, it's re-created
>> by the scripts, but i've included the files in the archive so you can see
>> what's created.
>> - A number of text files, which are the actually needed files for the new
>> traineddata format.
>>
>> Also, please note you'll need the "combine" binary. If you need any kind
>> of help regarding it's compilation, i've created a visual studio project for
>> it.
>>
>> Also, i've totally cleaned up the svn visual studio project. Now
>> everything is generated in only two folders (Debug, Release). Debugging
>> information is made in such case, and symbols are read properlly when
>> debugging. Let me know if you or anyone else needs this too.
>>
>> I wanted to use your commandline for Indic lang like Kannada.
>>
>> Let me know if it worked for you then.
>>
>> Thanks for your research, Pierre.
>>
>> You're very welcome.
>>
>> Best,
>> Pierre.
>>
>>
>> --
>>
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<tesseract-ocr%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Generating / Training box files.

Reply via email to