Hello Everyone,

I am working with a student at my university on using NLP techniques in
document categorisation in late Middle Irish.  I am a coder and I know
Java so that won't be a problem.  We are building a corpus at the moment.

We are working on a specific author and what we would like to do is see
if a particular poem/text is his or not based on NLP.  What I was
thinking is we would need a few things:

1) a corpus of Middle Irish texts of the same general linguistic range
(we are working on that at the moment).  Is there any
documentation/knowledge on how to create this (or is this just training
the POS tagger)?

2) Train a model

3) pass that model to the document categoriser with the relevant model
and what kinds of categories there are (his, not his, and unsure).

A few other miscellaneous questions: will we need to put part of speech
tags in the corpus to create the model?

Thanks in advance!,
Chris Yocum

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to