Katrin Tomanek wrote:
Dear Andrew,

I am new to UIMA and am trying to find the best tool for doing doing human
document annotation.  For instance, if I am building a machine-learning
based named entity tagger and I want to tag some text with named entities to train my recognizers, what would be the best way to do that?
I think thats a matter of human/manual annotation. Generating training material for ML is a laborious task which is not an issue of UIMA (as far as I understand). Depending on the entities and the domain and language you are interested in you might find annotated corpora (you might check http://torvald.aksis.uib.no/corpora/ for existing corpora).

regards,
Katrin




Also check http://registry.dfki.de/ for software tools to manually
annotate text.  I have no personal experience with any of the tools
there, but I have heard Alembic being favorably mentioned.  It looks
like it is freely available.  It should be relatively easy to transform
the resulting XML to UIMA, either via XSLT, or with a custom XML
parser that reads the annotated data and feeds it into UIMA APIs.

BTW, I have recently hacked UIMA's CAS Visual Debugger for a colleague
to allow creating manual annotations.  That was a one-off, though, and
I haven't fed it back into the main code base.  If people are interested
in that kind of functionality, let me know.  We wouldn't want to compete
with a dedicated annotation tool, though.

--Thilo

Reply via email to