Re: Tesseract 3.02.02 Released

Prajesh Ananthan Wed, 28 Nov 2012 07:03:02 -0800

Appreciate this. Just wondering, Is there a C# variant of Tesseract? Thanks 
in advance.


On Saturday, November 3, 2012 10:50:34 PM UTC+8, zdenop wrote:
>
> Hello all,
>
> Tesseract OCR 3.02 was released (as 3.02.02) and you can find it in 
> download section[1] or on the Project page in section "Featured".
>
>
> *Tesseract release notes - V3.02*
>
>    - Moved ResultIterator/PageIterator to ccmain. 
>    - Added Right-to-left/Bidi capability in the output iterators for 
>    Hebrew/Arabic.
>    - Added paragraph detection in layout analysis/post OCR. 
>    - Fixed inconsistent xheight during training and over-chopping.
>    - Added simultaneous multi-language capability. 
>    - Refactored top-level word recognition module.
>    - Added experimental equation detector. 
>    - Improved handling of resolution from input images.
>    - Blamer module added for error analysis. 
>    - Cleaned up externally used namespace by removing includes from 
>    baseapi.h.
>    - Removed dead memory management code. 
>    - Tidied up constraints on control parameters.
>    - Added support for ShapeTable in classifier and training. 
>    - Refactored class pruner.
>    - Fixed training leaks and randomness.
>    - Major improvements to layout analysis for better image detection, 
>    diacritic detection, better textline finding, better tabstop finding.
>    - Improved line detection and removal. 
>    - Added fixed pitch chopper for CJK.
>    - Added UNICHARSET to WERD_CHOICE to make mult-language handling 
>    easier. 
>    - Fixed problems with internally scaled images.
>    - Added page and bbox to string in tr files to identify source of 
>    training data better. 
>    - Fixes to Hindi Shiroreka splitter.
>    - Added word bigram correction. 
>    - Reduced stack memory consumption and eliminated some ugly typedefs.
>    - Added new uniform classifier API. 
>    - Added new training error counter.
>    - Fixed endian bug in dawg reader. 
>    - C API (thanks to Tobias Müller)
>    - New solution for VS 2008 (thanks to Tom Powers) 
>    - Many other fixes, including the way in which the chopper finds chops 
>    and messes with the outline while it does so. 
>
> Windows installer was build on Windows XP SP3 with NSIS tool. Tesseract.exe 
> (and trainings tools) is 32bit static build with VC++ 2008 Express, so 
> maybe you will need Microsoft Visual C++ 2008 SP1 Redistributable Package 
> (x86) [2].
>
> All google generated language data were updated (community language data 
> files were not updated yet).
> New languages available from google: afr, aze, bel, ben, chr, enm, epo, 
> est, eus, frm, glg, ita_old, kan, mal, mkd, mlt, msa, spa_old, sqi, swa, 
> tam, tel.
> Cube data files are available for ita, fra, rus, spa too.
> Added experimental equation detector (equ).
> There is also new community language  Ancient Greek (grc) - thanks to Nick 
> White.
>
> Language data files created for 3.00 and 3.01 can be used in 3.02. 
> Language data files created with Tesseract OCR 3.02 will not work in 
> previous versions.
>
> Thanks you all who shared your know-how and tested tesseract 3.02 in svn.
> Thanks Google for supporting this project!
>
> [1] http://code.google.com/p/tesseract-ocr/downloads/list
> [2] 
> http://www.microsoft.com/en-us/download/details.aspx?id=5582&WT.mc_id=MSCOM_EN_US_DLC_DETAILS_121LSUS007998
>  
>
> -- 
> Zdenko Podobný
> Community project contributor
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Tesseract 3.02.02 Released

Reply via email to