your problem is a pure NER problem and yes opennlp wold help if you had
enough training data. 10 examples is certainly not enough to train your
own models though.
If you're just looking for names of people or companies, you could use
the pre-trained models that ship with openNLP. It should
work...alternatively, if you can identify common morphological
similarities between your entities, then perhaps you can formulate them
as regex.
I think your best bet is to try the ready-made NER models and i that
doesn't work as expected you can try regex ,even though I don't think
regex iwll identify names of people reliably, no matter how well formed
it is...
hope that helps, :)
Jim
On 06/08/13 14:22, Markus Marks wrote:
Hi all,
i'm a german computer science student, who is currently writing on his
bachelor thesis. I write you because i'm very desperate. I have to
solve an information extraction task and i'm not quite sure, how to
solve it and i was hoping, you could help me or tell me if openNLP
would work out.
Ok... here it comes:
Let's assume I have a sender's adress from a letter. And i have few
annotated examples.
new document example with annotation
Mr. XYZ Enterprise Something
Example Company John Doe
Sample road 12514 somewhere else
somewhere another road
something
something something else
So the problem is how to generate a matching or learning algorithm, so
that I'm able to extract for example the name of the company or the
name of a new sender, considering some annotated examples i can
provide, with the problem that not every sender is written with the
same order or expressions.
The thing is that, i only have really few examples, like less than 10.
You have any suggestions how to solve this? I would be really
thankful, since i'm very disappointed, not finding a solution.
Yours thankfully,
Markus