Katrin Tomanek wrote:
Dear Andrew,
I am new to UIMA and am trying to find the best tool for doing doing
human
document annotation. For instance, if I am building a machine-learning
based named entity tagger and I want to tag some text with named
entities to
train my recognizers, what would be the best way to do that?
I think thats a matter of human/manual annotation. Generating training
material for ML is a laborious task which is not an issue of UIMA (as
far as I understand). Depending on the entities and the domain and
language you are interested in you might find annotated corpora (you
might check http://torvald.aksis.uib.no/corpora/ for existing corpora).
regards,
Katrin
Also check http://registry.dfki.de/ for software tools to manually
annotate text. I have no personal experience with any of the tools
there, but I have heard Alembic being favorably mentioned. It looks
like it is freely available. It should be relatively easy to transform
the resulting XML to UIMA, either via XSLT, or with a custom XML
parser that reads the annotated data and feeds it into UIMA APIs.
BTW, I have recently hacked UIMA's CAS Visual Debugger for a colleague
to allow creating manual annotations. That was a one-off, though, and
I haven't fed it back into the main code base. If people are interested
in that kind of functionality, let me know. We wouldn't want to compete
with a dedicated annotation tool, though.
--Thilo