You can use 3.01 language data file in 3.02 (tested ;-) )
3.02 training requries[1] usage of additional tool - shapeclustering [2]
but I did not tested if it make difference (e.g. 3.01 vs 3.02 language data
file). Maybe Nick did some tests (he created grc[2] file for 2.0x,
3.01[3] and 3.02[4])...

[1] http://code.google.com/p/tesseract-ocr/issues/detail?id=629#c8
[2] http://tesseract-ocr.googlecode.com/svn/trunk/doc/shapeclustering.1.html
[3] http://code.google.com/p/tesseract-ocr/issues/detail?id=770
[4] http://code.google.com/p/tesseract-ocr/issues/detail?id=754

-- 
Zdenko

On Mon, Oct 1, 2012 at 11:10 AM, Speedy <[email protected]> wrote:

> Hi,
>
> I'll try another shot: When I move from tesseract 3.01 to tesseract 3.02
> should I retrain my fonts with the 3.02 training tools or does this not
> matter?
>
> Best regards,
> Marcus
>
> On Thursday, September 20, 2012 4:31:50 PM UTC+2, Speedy wrote:
>
>> Hi there,
>>
>> we are currently using tesseract 3.01 as OCR engine and have trained a
>> number of fonts with it. Things work quite well, but we would like to move
>> to version 3.02 for two reasons:
>>
>>    - It is possible to combine fonts
>>    - The character recognition is supposed to be significantly improved
>>
>> In our tests we found that the character recognition has chenged, but the
>> results are mixed. In particular, quite a few characters that previously
>> had few confusions now have none (which is good), but then there are
>> characters that are much worse, making the overall result worse. For
>> example, in one dataset the number of confusions from H to M has increased
>> from 7 to 52 and the number of confusions from O to D has increased from 15
>> to 37.
>>
>> Is there a difference in the font files between tesseract 3.01 and 3.02?
>> Does it matter to tesseract 3.02 whether a font was trained with 3.01
>> training? Would it help to retrain the fonts with tesseract 3.02 training
>> tools or should this not matter?
>>
>> In what way was character recognition improved in tesseract 3.02?
>>
>> Thanks in advance for any help you can provide!
>>
>> Best regards,
>> Marcus
>>
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to