Thanks for all these answers, Patrick. Regarding speed, that's ok, it
was only to have an idea, not precise numbers.

One more question: which revision of 3.01 did you manage to cross-
compile? I tried with r552 and r581 without success while it's
compiling correctly for the tesseract-3.00.tar.gz available on
code.google.com.

Thanks,

Cyril

On Jul 23, 6:10 am, patrickq <[email protected]> wrote:
> ScanBizCards actually uses Tesseract 3.01 - I believe the fears
> expressed by many on this forum about using "non official" versions of
> Tesseract are misplaced. We switched from 2.04 to 3.00 as soon as 3.0
> was made available - and only benefited from it - then switched to
> 3.01 quickly - and again experienced significant improvements (and
> only rare cases where Tesseract 3.01 did less well than Tesseract
> 3.00).
>
> Tesseract's image processing is acceptable in the case of images with
> a fairly uniform background and text with high contrast relative to
> the background, such as this 
> example:http://www.scanbizcards.com/benefitquest.jpg
>
> Since we very often get images with shadows, bad lighting and
> backgrounds with strong colors ScanBizCards applies its own image
> processing and calls Tesseract only with a black and white image. This
> is what we produce on the above example (in this case results are
> identical - and good - without our preprocessing 
> too):http://www.scanbizcards.com/benefitquest-bw.jpg
>
> Regarding performance (on iPhone 3GS):
> - we spend 3 seconds on image processing for a 1,024 x 768 image
> - OCR then takes us 14 seconds, but that's not just Tesseract,
> includes time spent in our code
>
> I can get you performance numbers just for Tesseract if you'd like and
> for the iPhone 4, let me know.
>
> Regarding layout analysis: it's available and it works. I don't know
> if there is an API that returns the coordinates of words, we use the
> sequence of boxes for each letter then we determine where there should
> be a space (we don't trust Tesseract's space decision much) or a
> newline and so we have the coordinates of words. Tesseract 3.01 layout
> analysis is about the same as Tesseract 3.0 from my limited comparison
>
> Patrick
>
> On Jul 20, 11:20 am, Cyril <[email protected]> wrote:> Hi,
>
> > I have some basic questions before starting a project of OCR
> > recognition for the iPhone.
>
> > I have seen the steps to cross-compile tesseract for iOS but have some
> > questions on tesseract roadmap itself:
> > 1/ should I start on tesseract 2.4 or 3.0? From my understanding 3.0
> > is not yet stable but has a major refactoring ongoing plus several
> > features (including document layout analysis)? The current 3.0
> > "release" is quite far from the head of the trunk, which do not seem
> > to compile on iOS, so I am wondering if there is any new release
> > (3.01?) planned soon and compatible with iOS?
> > 2/ is the accuracy and speed of the 3.0 release better or at least
> > similar to the 2.4 release?
> > 3/ is the document layout analysis already stable? A particular need I
> > have is to be able to get the position of a particular recognized word
> > in the document? Is this possible with tesseract?
> > 4/ what is the typical preprocessing steps involved in OCR (b&w,
> > threshold etc.)? Are they already performed by tesseract or do I need
> > to perform them myself? If yes with which library is it usually done?
> > Leptonica or OpenCV?
>
> > I am also interested if you could give me pointers to code samples
> > that demonstrate the API usage or tutorials on OCR concepts or on the
> > APIs of tesseract. Any pointer to the state-of-the-art of OCR,
> > including papers on useful preprocessing techniques impacting
> > performance is also welcomed.
>
> > I have seen that ScanBizCard is using tesseract 3.0. Do you have other
> > examples of iPhone applications using Tesseract or concurrent
> > solutions (commercial or open-source)?
>
> > Thanks in advance for all your answers,
>
> > Cyril

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to