Dear Pirrre,
I tested using OCRB.tif and eurotext.tif and its output are attached
herewith. I used commandline for both tif
using tesseract 3.0 version.
It is observed that for output texts using *cst*(generated by you) and *eng
* datafiles for *OCRB.tif *are identical and found to be in order
whereas output texts using *cst* and *eng* datafiles for *eurotext.tif* are
not identical and found that text generated using *cst* has many mispelling
when compared to output txt using *eng*.datafiles viz traineddata..
The above observations are brought to your kind investigation and valuable
guidance regarding how to improve accuracy in the output.
With regards,
-sriranga(77yrsold)
On Tue, Apr 13, 2010 at 7:52 AM, MARTIN Pierre <[email protected]> wrote:
> Dear Sriranga,
>
> Please confirm whether you have succeeded in training by using your
> commandline like
> "tesseract OCRB.tif ./cst.OCRB.page001 nobatch box.train.logfile"
> [please note Logfile is used for Windows platform like winXP]
> Kindly upload OCRB.tif for hands on experience by me.
>
> Sure, but the files are too big. i'm going to create a compressed file, so
> you can see. Also i'll include the batch files i've made (For windows, but
> the commands are pretty much the same for nux/nix.
>
> In this attachement:
> - 7 batch files, named in order of run.
> - Two pictures, first one (OCRBFull.tif) is from a photoshop document i've
> manufactured with the OCRB font, the other one (OCRBReal.tif) is a patchwork
> of real scanned data.
> - You can delete everything inside the "Generated" folder, it's re-created
> by the scripts, but i've included the files in the archive so you can see
> what's created.
> - A number of text files, which are the actually needed files for the new
> traineddata format.
>
> Also, please note you'll need the "combine" binary. If you need any kind of
> help regarding it's compilation, i've created a visual studio project for
> it.
>
> Also, i've totally cleaned up the svn visual studio project. Now everything
> is generated in only two folders (Debug, Release). Debugging information is
> made in such case, and symbols are read properlly when debugging. Let me
> know if you or anyone else needs this too.
>
> I wanted to use your commandline for Indic lang like Kannada.
>
> Let me know if it worked for you then.
>
> Thanks for your research, Pierre.
>
> You're very welcome.
>
> Best,
> Pierre.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle” braune Fuchs springt
uber den faulen Hund. Le renard brun
<<rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre o cio preguieoso.
The (quick) IbtoWnI U`oxI jumpsE
OVet the $439456.78 <Iazy> #90 dog
6Z duck/goose. as I2.5'%D of E.maH
from aspammetCc.2websEte.com Is spam.
Det MschneHe9` btaune Fuchs spdngt
Ubet den 5au]en Hcmd. I.e tenatd bnm
<<tapEdeD saute pandessus Ie chEen
paresseux' La VoIpe man<me rapEda
saIta sopm U cane p5gto. EI zono
matt6D tépido saIta sobte eI petto
perezoso. A taposa manDm rzip5da
saHa sobre o céo pteguiq;oso.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz 0 ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.
The quick brown fox <12345>+67890=$500,00 jumps over {the} (Lazy) Edog].
Document #1 is the key to Tesseract Training;
abcd efgh ijkL mnop qrst uvwx yz U ABCD EFGH IJKL MNOP QRST UVWX YZ...
0123456789 0123456789
The quick brown & fow jumps over the `Lazy' dog.
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
123 = 456 * 789 / 0,8 % 7.5
J'ai mangé é La cantine.