On Thu, Dec 20, 2012 at 3:25 PM, Patrick Questembert <
patrick.questemb...@gmail.com> wrote:

> Update: the Suzuki cook-book for building on iOS still works, see
> https://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/
>
> About performance: we have observed only a relatively small performance
> gain with Tesseract 3.02 versus Tesseract 3.01 - something like 10s versus
> 11s - this is far from some estimates I have posted here quoting 2x
> performance improvements. Furthermore, I experimented with Tesseract 3.02
> running with the Tess 3.01 training set and:
> 1. There are more accuracy errors: this is something Ray mentioned should
> be expected if running the new code with the old training set, because the
> new eng.traineddata has essential additional information for each character.
> 2. Speed is MUCH improved in that weird combination - 5.8s versus 11s! I
> am guessing that's because the missing additional data for characters
> causes new algorithms to be skipped. Can anyone shed some light?
>
> In general, does anyone know what's new in the Tess 3.02 training set? Is
> it the same set of fonts with extra data or does it add many new fonts?
>
> Short description of 3.02 language components is on COMBINE_TESSDATA
Manual Page[1].
Walking through svn revisions you can find these changes:

   - r526[2] added support
   for fixed-length-dawgs, cube-unicharset, cube-word-dawg components
   - r658[3] added support for shapetable, bigram-dawg, unambig-dawg,
   params-training-model components


As far as I checked unambig and params-training-model are not present in
any available language data.

[1]
http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components
[2]* *
https://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/ccutil/tessdatamanager.h&format=side&r=526&old_path=/trunk/ccutil/tessdatamanager.h&old=441
[3]
https://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/ccutil/tessdatamanager.h&format=side&r=658


Zdenko



> Patrick
>
>
> On Monday, December 17, 2012 11:36:04 AM UTC-5, Patrick Questembert wrote:
>>
>> We are using Tesseract 3.01 and about to take the plunge to Tesseract
>> 3.02, and would appreciate a couple of pointers:
>> - latest iOS / i386 "cook book" to build Tesseract for use in iOS app and
>> iOS Simulator (which means arm + 386)?
>> - has anyone compared performance? I read an estimate of 2x faster for
>> 3.02 but is that still true even though the training set is 21.9 MB for
>> 3.02 versus 3.1 MB for 3.01? Surely this has got to slow things down (even
>> if it means higher accuracy).
>>
>> Thanks!
>> Patrick
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to tesseract-ocr@googlegroups.com
> To unsubscribe from this group, send email to
> tesseract-ocr+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to