Re: Tesseract 3.0 Performance down at 32bit Os

Dmitri Silaev Wed, 06 Nov 2013 13:31:27 -0800

Andreas,

You are talking about vectorization which has nothing to do with what
I am talking about, because this optimization technique still works
within a single core (see articles on vector instructions on the
internet).

But even if you were talking about the right topic, automatic
parallelization
(http://msdn.microsoft.com/en-us/library/vstudio/hh872235.aspx,
http://gcc.gnu.org/wiki/AutoParInGCC), formally, yes, it is taking
advantage of multiple cores, but Tesseract's code itself is not. It's
just as a magic wand anyone can use - bam! and your program is
parallel. No it's not like that. A real advantage is achieved when
*you wrote* your program in a parallel manner.

I can't tell if the speed changed "significantly" but when I was
experimenting with these optimizations, one thing can be said for
sure. For my test images the difference in the results was *huge*. So
huge that some "optimized version" results could be named useless. A
conclusion: it is dangerous to use any optimization options with
Tesseract without substantial testing.

There is a thread on this forum, still alive, about differences in
Tesseract's results; optimizations are one of the reasons. Same thing
on various platforms. I personally tested with Intel and ARM
processors; Windows, Linux, iOS; compilers: VC2010, gcc, LLVM.

Nevertheless, it would be interesting to me and all forum members, to
see some numbers or maybe qualitative estimates on Tesseract execution
speed with auto-vectorization turned on. For some types of images
probably it can work. Are you sure you can't remember anything?

Warm regards,
Dmitri Silaev
www.CustomOCR.com

On Wed, Nov 6, 2013 at 7:31 PM, Andreas Romeyke
<[email protected]> wrote:
>
> Hello Dmitri,
>
>
> Am Donnerstag, 31. Oktober 2013 09:01:44 UTC+1 schrieb Dmitri Silaev:
>
>>
>> Tesseract at this time does not take advantage of multi-processor or
>> multi-core architecture. A single instance of Tesseract-enabled app
>> would run on a single core, hence all that matters is the clock,
>> memory and FSB speed, cache etc. Unless you make your own efforts to
>> write a multi-threaded Tesseract API based app, chopping an input
>> image to pieces and processing them in parallel. However special care
>> and manual changes to Tesseract code are required to make it
>> thread-safe.
>>
> That is not right. Tesseract can take profit if compiled with gcc enabled
> autovectorization. The speed difference will be significant (do not remember
> the exact results of my autovectorization experiments).
>
> See http://gcc.gnu.org/projects/tree-ssa/vectorization.html for details how
> to enable it.
>
> With best regards
>
> Andreas
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Tesseract 3.0 Performance down at 32bit Os

Reply via email to