Hello Everyone, I am working with a student at my university on using NLP techniques in document categorisation in late Middle Irish. I am a coder and I know Java so that won't be a problem. We are building a corpus at the moment.
We are working on a specific author and what we would like to do is see if a particular poem/text is his or not based on NLP. What I was thinking is we would need a few things: 1) a corpus of Middle Irish texts of the same general linguistic range (we are working on that at the moment). Is there any documentation/knowledge on how to create this (or is this just training the POS tagger)? 2) Train a model 3) pass that model to the document categoriser with the relevant model and what kinds of categories there are (his, not his, and unsure). A few other miscellaneous questions: will we need to put part of speech tags in the corpus to create the model? Thanks in advance!, Chris Yocum
signature.asc
Description: OpenPGP digital signature