On Thu, Dec 20, 2012 at 3:25 PM, Patrick Questembert < patrick.questemb...@gmail.com> wrote:
> Update: the Suzuki cook-book for building on iOS still works, see > https://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/ > > About performance: we have observed only a relatively small performance > gain with Tesseract 3.02 versus Tesseract 3.01 - something like 10s versus > 11s - this is far from some estimates I have posted here quoting 2x > performance improvements. Furthermore, I experimented with Tesseract 3.02 > running with the Tess 3.01 training set and: > 1. There are more accuracy errors: this is something Ray mentioned should > be expected if running the new code with the old training set, because the > new eng.traineddata has essential additional information for each character. > 2. Speed is MUCH improved in that weird combination - 5.8s versus 11s! I > am guessing that's because the missing additional data for characters > causes new algorithms to be skipped. Can anyone shed some light? > > In general, does anyone know what's new in the Tess 3.02 training set? Is > it the same set of fonts with extra data or does it add many new fonts? > > Short description of 3.02 language components is on COMBINE_TESSDATA Manual Page[1]. Walking through svn revisions you can find these changes: - r526[2] added support for fixed-length-dawgs, cube-unicharset, cube-word-dawg components - r658[3] added support for shapetable, bigram-dawg, unambig-dawg, params-training-model components As far as I checked unambig and params-training-model are not present in any available language data. [1] http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components [2]* * https://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/ccutil/tessdatamanager.h&format=side&r=526&old_path=/trunk/ccutil/tessdatamanager.h&old=441 [3] https://code.google.com/p/tesseract-ocr/source/diff?path=/trunk/ccutil/tessdatamanager.h&format=side&r=658 Zdenko > Patrick > > > On Monday, December 17, 2012 11:36:04 AM UTC-5, Patrick Questembert wrote: >> >> We are using Tesseract 3.01 and about to take the plunge to Tesseract >> 3.02, and would appreciate a couple of pointers: >> - latest iOS / i386 "cook book" to build Tesseract for use in iOS app and >> iOS Simulator (which means arm + 386)? >> - has anyone compared performance? I read an estimate of 2x faster for >> 3.02 but is that still true even though the training set is 21.9 MB for >> 3.02 versus 3.1 MB for 3.01? Surely this has got to slow things down (even >> if it means higher accuracy). >> >> Thanks! >> Patrick >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to tesseract-ocr@googlegroups.com > To unsubscribe from this group, send email to > tesseract-ocr+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en