HMM-based POS tagger for the sandbox

Thilo Goetz Tue, 23 Oct 2007 09:12:35 -0700

All,

I'm currently working with a student, Eugenie Giesbrecht,
who is implementing a HMM-based part-of-speech tagger
for inclusion in the sandbox.  This is 100% original
work of Eugenie's for Apache, and we'll start checking
in code during the next few days.


The only data Eugenie currently has for experimentation
is the Brown corpus of American English.  If you have
any POS-tagged data that we could use for training
(English or other languages), please let us know.
The usual license restrictions apply.  I don't think we
can use any data that's only free for research purposes.

Please let us know if you have any suggestions or would
like to help ;-)  Once Eugenie has something running,
we'll make an announcement on the user's list.

Eugenie has an ICLA on file with the ASF.

--Thilo

HMM-based POS tagger for the sandbox

Reply via email to