Re: Word order modelling

Ray Smith Tue, 03 Feb 2009 14:58:17 -0800

Hi Michael,
I assume by word order modelling you mean part of speech tagging and
modelling the order of the parts of speech, perhaps by some HMM - based
model?

In any case, there are 2 ways to do this:

(a) Easier to do, but lighter effect:
Modify the adjust_word function to make use of the local context to promote
words that fit the model.

(b) More difficult to do, but possibly larger effect.
Split the dictionary by part of speech into multiple (possibly overlapping)
sub-dictionaries. Change permute_words to search each of the
sub-dictionaries and then adjust_word is more likely to have a good range to
choose from.

We try to incorporate improvements into the mainline code. This has been a
bit slow so far, but the turnaround time is improving as I catch up.

Ray.

On Mon, Feb 2, 2009 at 3:04 PM, Michael Reimer <[email protected]>wrote:

>
> Hello all.  I'm a computational linguistics graduate student, and I'd
> like to do some work on Tesseract for credit in a software engineering
> course.  My area of interest is word order modelling and I believe
> this can and has been used to improve the accuracy of other OCR
> systems.  As far as I can tell, Tesseract has nothing similar
> currently, so I'm interested in adding it.  Any feedback on that idea
> would be appreciated.
>
> I'm also completely new to open source, so provided that my general
> goal appeals, I would appreciate any and all advice on how best to get
> involved with your community, get myself up to speed technically (I am
> reading the "Hacking Tesseract" manual currently), avoid stepping on
> toes, and so on.  Thanks.
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Word order modelling

Reply via email to