Anyone have succeeded to generate <lang>.traineddata files of tesseract 3.0 for Windows platform or Ubuntu will please post here "how to do" for benefit of other users. -sriranga(76yrsold)
On Sat, Jul 11, 2009 at 11:55 AM, Ray Smith <[email protected]> wrote: > Help no longer quite so necessary! Although it will still be useful for > users of the dll I expect. > There is now a very preliminary version of 3.00 in svn for the brave to > try. > > Warning: I haven't downloaded the svn code myself to test it, but the code > I think I uploaded works on both linux and windows, with Leptonica and the > other image I/O libraries. > > The solution to the windows CRT problem is like this: > Leptonica is built in VC++6, with a few hacks to make it do so. This > results in a dll that depends on msvcrt.dll, and is therefore compatible > with the other imaging libraries. Since Leptonica has a "safe" API, it can > work with Tesseract built with either a static CRT or with msvcr90.dll, so > the svn (and later tar.gz) includes copies of all the dll dependencies, libs > and header files so no extra work is needed to build it on windows. > > There are some download and build instructions in the ReadMe > wiki,<http://code.google.com/p/tesseract-ocr/wiki/ReadMe>but there is no > other documentation at present. > Here is a brief list of what's in it: > Page layout analysis and different modes of layout assumption down to a > single character. See baseapi.h > 29 languages that will increase to 33 when it is a proper download. We > haven't tested all the languages, but Russian looks good, and Vietnamese is > greatly improved. > Major changes to the API to move it towards thread-safety. Not there yet, > but dramatically improved. > Old-style variables eliminated. > Language data in a single data file for each language simplifies managing > the files. > Greater stability. Leaks fixed. Memory errors fixed. > > Enjoy. > Ray > > > On Thu, Jul 9, 2009 at 10:43 PM, 74yrs old <[email protected]>wrote: > >> If anyone is sufficiently expert with VC++2008 succeeded in compiling in >> WinXP, requested to post the details of how to do (step by step procedure >> to be followed for compiling in VC++2008) for benefit of other users to >> perform testing and feedback. >> With Regards, >> -sriranga(76yrsold) >> >> >> On Fri, Jul 10, 2009 at 7:40 AM, Ray Smith <[email protected]> wrote: >> >>> *This is a plea for help!* >>> Anyone interested in seeing 3.00 this side of August? >>> >>> Here is the status: >>> >>> Linux: >>> Preliminary alpha release compiles and runs. It is slower than 2.04, due >>> to the new page layout analysis, but the benefits are supposed to outweigh >>> that: >>> Page layout analysis. >>> *Lots* of languages. >>> more... >>> In theory the linux version should compile and link happily with >>> leptonica, given the right combination of apt-gets. Not tested yet, as I >>> have been bogged down with windows: >>> >>> Windows: >>> Preliminary alpha release also compiles and runs *without leptonica >>> only*. >>> DLL is broken due to API change. >>> >>> I only have very little time left before I will be away for a while, but >>> I was hoping to post a pre-alpha version to svn for people to try. >>> >>> The problem is that there is no chance of getting the windows version to >>> work with leptonica any time soon, and without it the flagship page layout >>> analysis won't work properly. >>> >>> Here is the problem: >>> Leptonica depends on the following lower-level libraries: >>> libjpeg, >>> libpng, >>> libtiff, >>> zlib. >>> >>> DLLs for these are all available for windows, but they are all compiled >>> to use msvcrt.dll. >>> Tesseract and Leptonica will not work unless they use the same crt >>> (C-runtime) as the libraries, and VC++2008, which everyone wants to use will >>> not (without jumping through more hoops than I can ask an average tesseract >>> user to do) build anything to use msvcrt.dll. You must use either a >>> statically linked crt, or use msvcr90.dll, a newer version that contains >>> .net stuff that tesseract doesn't care about. >>> >>> *What I need are statically linked versions of the 4 libraries above >>> compiled to use a statically linked crt (/MT option) and possibly their >>> dependencies.* >>> Alternatively, libraries built for the new msvcr90.dll (/MD) would do, >>> but that would mean everybody has to have the VC++2008 distributables. It >>> might help dll users though, when it is eventually working again. >>> >>> *This is not an easy task, as most of the sources for these libraries >>> don't have vcproj/sln projects with which to build them.* >>> If anyone is sufficiently expert with VC++2008 and building other >>> people's code, and understands what I am talking about, I would be grateful >>> for the help. >>> The other viable alternative would be to compile letonica without image >>> i/o at all, and leave tesseract still unable to read anything other than >>> compressed tiff. >>> >>> Ray. >>> >>> PS A good place to get all these libraries is: >>> http://gnuwin32.sourceforge.net/packages/*.htm, where * is tiff, jpeg, >>> libpng, or zlib. >>> >>> On Tue, May 12, 2009 at 5:49 AM, javolo <[email protected]> wrote: >>> >>>> >>>> Ditto! I'm working on a pretty cool OCR application, and I'd happily >>>> help testing for access to the 3.0 beta or release candidate. >>>> I can test on Ubuntu and Windows XP. >>>> >>>> Thanks... >>>> >>>> On May 4, 3:07 pm, "Rob H." <[email protected]> wrote: >>>> > But seriously... I'm writing a fairly interesting application using >>>> > Tesseract for my client: Gulfstream Aerospace. >>>> > I have no problem testing 3.0, especially if I can get some >>>> > performance gains. >>>> >>>> >>> >>> >>> >> >> >> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

