Anyone have succeeded to generate <lang>.traineddata files of tesseract 3.0
for Windows platform or Ubuntu will please post here "how to do"  for
benefit of other users.
-sriranga(76yrsold)

On Sat, Jul 11, 2009 at 11:55 AM, Ray Smith <[email protected]> wrote:

> Help no longer quite so necessary! Although it will still be useful for
> users of the dll I expect.
> There is now a very preliminary version of 3.00 in svn for the brave to
> try.
>
> Warning: I haven't downloaded the svn code myself to test it, but the code
> I think I uploaded works on both linux and windows, with Leptonica and the
> other image I/O libraries.
>
> The solution to the windows CRT problem is like this:
> Leptonica is built in VC++6, with a few hacks to make it do so. This
> results in a dll that depends on msvcrt.dll, and is therefore compatible
> with the other imaging libraries. Since Leptonica has a "safe" API, it can
> work with Tesseract built with either a static CRT or with msvcr90.dll, so
> the svn (and later tar.gz) includes copies of all the dll dependencies, libs
> and header files so no extra work is needed to build it on windows.
>
> There are some download and build instructions in the ReadMe 
> wiki,<http://code.google.com/p/tesseract-ocr/wiki/ReadMe>but there is no 
> other documentation at present.
> Here is a brief list of what's in it:
> Page layout analysis and different modes of layout assumption down to a
> single character. See baseapi.h
> 29 languages that will increase to 33 when it is a proper download. We
> haven't tested all the languages, but Russian looks good, and Vietnamese is
> greatly improved.
> Major changes to the API to move it towards thread-safety. Not there yet,
> but dramatically improved.
> Old-style variables eliminated.
> Language data in a single data file for each language simplifies managing
> the files.
> Greater stability. Leaks fixed. Memory errors fixed.
>
> Enjoy.
> Ray
>
>
> On Thu, Jul 9, 2009 at 10:43 PM, 74yrs old <[email protected]>wrote:
>
>> If anyone is sufficiently expert with VC++2008 succeeded in compiling in
>> WinXP, requested to post the details of  how to do (step by step procedure
>> to be followed for compiling in VC++2008) for benefit of other users to
>> perform testing and feedback.
>> With Regards,
>> -sriranga(76yrsold)
>>
>>
>> On Fri, Jul 10, 2009 at 7:40 AM, Ray Smith <[email protected]> wrote:
>>
>>> *This is a plea for help!*
>>> Anyone interested in seeing 3.00 this side of August?
>>>
>>> Here is the status:
>>>
>>> Linux:
>>> Preliminary alpha release compiles and runs. It is slower than 2.04, due
>>> to the new page layout analysis, but the benefits are supposed to outweigh
>>> that:
>>> Page layout analysis.
>>> *Lots* of languages.
>>> more...
>>> In theory the linux version should compile and link happily with
>>> leptonica, given the right combination of apt-gets. Not tested yet, as I
>>> have been bogged down with windows:
>>>
>>> Windows:
>>> Preliminary alpha release also compiles and runs *without leptonica
>>> only*.
>>> DLL is broken due to API change.
>>>
>>> I only have very little time left before I will be away for a while, but
>>> I was hoping to post a pre-alpha version to svn for people to try.
>>>
>>> The problem is that there is no chance of getting the windows version to
>>> work with leptonica any time soon, and without it the flagship page layout
>>> analysis won't work properly.
>>>
>>> Here is the problem:
>>> Leptonica depends on the following lower-level libraries:
>>> libjpeg,
>>> libpng,
>>> libtiff,
>>> zlib.
>>>
>>> DLLs for these are all available for windows, but they are all compiled
>>> to use msvcrt.dll.
>>> Tesseract and Leptonica will not work unless they use the same crt
>>> (C-runtime) as the libraries, and VC++2008, which everyone wants to use will
>>> not (without jumping through more hoops than I can ask an average tesseract
>>> user to do) build anything to use msvcrt.dll. You must use either a
>>> statically linked crt, or use msvcr90.dll, a newer version that contains
>>> .net stuff that tesseract doesn't care about.
>>>
>>> *What I need are statically linked versions of the 4 libraries above
>>> compiled to use a statically linked crt (/MT option) and possibly their
>>> dependencies.*
>>> Alternatively, libraries built for the new msvcr90.dll (/MD) would do,
>>> but that would mean everybody has to have the VC++2008 distributables. It
>>> might help dll users though, when it is eventually working again.
>>>
>>> *This is not an easy task, as most of the sources for these libraries
>>> don't have vcproj/sln projects with which to build them.*
>>> If anyone is sufficiently expert with VC++2008 and building other
>>> people's code, and understands what I am talking about, I would be grateful
>>> for the help.
>>> The other viable alternative would be to compile letonica without image
>>> i/o at all, and leave tesseract still unable to read anything other than
>>> compressed tiff.
>>>
>>> Ray.
>>>
>>> PS A good place to get all these libraries is:
>>> http://gnuwin32.sourceforge.net/packages/*.htm, where * is tiff, jpeg,
>>> libpng, or zlib.
>>>
>>> On Tue, May 12, 2009 at 5:49 AM, javolo <[email protected]> wrote:
>>>
>>>>
>>>> Ditto!  I'm working on a pretty cool OCR application, and I'd happily
>>>> help testing for access to the 3.0 beta or release candidate.
>>>> I can test on Ubuntu and Windows XP.
>>>>
>>>> Thanks...
>>>>
>>>> On May 4, 3:07 pm, "Rob H." <[email protected]> wrote:
>>>> > But seriously... I'm writing a fairly interesting application using
>>>> > Tesseract for my client: Gulfstream Aerospace.
>>>> > I have no problem testing 3.0, especially if I can get some
>>>> > performance gains.
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to