Thanks for your input Dmitri.

Raj

On Jul 6, 10:31 pm, Dmitri Silaev <[email protected]> wrote:
> Yes, this is possible, at least in theory. In box files you can map
> arbitrary glyphs to character sequences. However possibility is high
> you'll stumble upon some difficulties with accuracy. From what comes
> to my mind for the moment, I can name the two. First, although
> Tesseract is somewhat immune to glyph variations, these can be quite
> high in the case of handwritten text. Second, Tesseract uses internal
> scaling for every glyph (called normalization), so that many word
> glyphs obviously different to a human eye can be recognized as the
> same word. By the same reason Tess may confuse word glyphs if their
> lengths vary much and there are very long words. What is "vary much"
> and "very long" should be determined experimentally, though.
>
> BTW I suppose you mean that your historic documents use a connected
> script, as not all cursive is necessarily connected, 
> seehttp://en.wikipedia.org/wiki/Cursive. With letters that are only
> sloppy but not connected, the problem is much easier, and imho it
> makes sense to spend some time devising a good segmentation algo and
> pre- and post-processing logic to use Tess in a more traditional way.
>
> HTH
>
> Warm regards,
> Dmitri Silaevwww.CustomOCR.com
>
>
>
>
>
>
>
> On Wed, Jul 6, 2011 at 7:42 PM, Raj Julha <[email protected]> wrote:
> > Hi
>
> > I'm planning to train Tesseract on handwritten text, from mainly
> > historical documents. Because of the cursive nature of the handwritten
> > text it is difficult to isolate single characters so I was planning to
> > create images of words and then use a list of words as training
> > source. Alternatively I could create a text file with the handwritten
> > transcription and the coordinates of each word on the image. Can I use
> > that as input for tesseract training? I'm mainly interested in using
> > the command line version.
>
> > Cheers
>
> > Raj
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to