You can use 3.01 language data file in 3.02 (tested ;-) ) 3.02 training requries[1] usage of additional tool - shapeclustering [2] but I did not tested if it make difference (e.g. 3.01 vs 3.02 language data file). Maybe Nick did some tests (he created grc[2] file for 2.0x, 3.01[3] and 3.02[4])...
[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=629#c8 [2] http://tesseract-ocr.googlecode.com/svn/trunk/doc/shapeclustering.1.html [3] http://code.google.com/p/tesseract-ocr/issues/detail?id=770 [4] http://code.google.com/p/tesseract-ocr/issues/detail?id=754 -- Zdenko On Mon, Oct 1, 2012 at 11:10 AM, Speedy <[email protected]> wrote: > Hi, > > I'll try another shot: When I move from tesseract 3.01 to tesseract 3.02 > should I retrain my fonts with the 3.02 training tools or does this not > matter? > > Best regards, > Marcus > > On Thursday, September 20, 2012 4:31:50 PM UTC+2, Speedy wrote: > >> Hi there, >> >> we are currently using tesseract 3.01 as OCR engine and have trained a >> number of fonts with it. Things work quite well, but we would like to move >> to version 3.02 for two reasons: >> >> - It is possible to combine fonts >> - The character recognition is supposed to be significantly improved >> >> In our tests we found that the character recognition has chenged, but the >> results are mixed. In particular, quite a few characters that previously >> had few confusions now have none (which is good), but then there are >> characters that are much worse, making the overall result worse. For >> example, in one dataset the number of confusions from H to M has increased >> from 7 to 52 and the number of confusions from O to D has increased from 15 >> to 37. >> >> Is there a difference in the font files between tesseract 3.01 and 3.02? >> Does it matter to tesseract 3.02 whether a font was trained with 3.01 >> training? Would it help to retrain the fonts with tesseract 3.02 training >> tools or should this not matter? >> >> In what way was character recognition improved in tesseract 3.02? >> >> Thanks in advance for any help you can provide! >> >> Best regards, >> Marcus >> >> > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

