Hi,
Sorry for my late reply. I didn't understand well your last email, but here
is what I meant:
Given a simple dictionary you have that has the following columns:
Name Type Gender
Agatha First F
JohnFirst M
Smith Both B
where:
- "First" refers to first name, "Last" (not in the example) refers to last
name, and Both means it can be both.
- "F" refers to female, "M" refers to males, and "B" refers to both genders.
and given the following two sentences:
1. "It was nice meeting you John. I hope we meet again soon."
2. "Yes, I met Mrs. Smith. I asked her her opinion about the case and felt
she knows something"
In the first example, when you check in the dictionary, the name "John" is
a male name, so no need to go any further.
However, in the second example, the name "Smith", which is a family name in
our case, can be fit for both, males and females. Therefore, we need to
extract features from the surrounding context and perform a classification
task.
Here are some of the features I think they would be interesting to use:
. Presence of a male initiative before the word {True, False}
. Presence of a female initiative before the word {True, False}
. Gender of the first personal pronoun (subject or object form) to the
right of the nameValues={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the first personal pronoun to the right (in
words) Values=NUMERIC
. Gender of the second personal pronoun to the right of the
name Values={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the second personal pronoun right
Values=NUMERIC
. Gender of the third personal pronoun to the right of the
name Values={MALE, FEMALE, UNCERTAIN,
EMPTY}
. Distance between the name and the third personal pronoun right (in
words) Values=NUMERIC
. Gender of the first personal pronoun (subject or object form) to the left
of the name Values={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the first personal pronoun to the left (in
words)Values=NUMERIC
. Gender of the second personal pronoun to the left of the
nameValues={MALE, FEMALE, UNCERTAIN,
EMPTY}
. Distance between the name and the second personal pronoun left
Values=NUMERIC
. Gender of the third personal pronoun to the left of the
nameValues={MALE, FEMALE,
UNCERTAIN, EMPTY}
. Distance between the name and the third personal pronoun left (in
words)Values=NUMERIC
In the second example here are the values you have for your features
F1 = False
F2 = True
F3 = UNCERTAIN
F4 = 1
F5 = FEMALE
F6 = 3
F7 = FEMALE
F8 = 4
F9 = UNCERTAIN
F10 = 2
F11 = EMPTY
F12 = 0
F13 = EMPTY
F14 = 0
Of course the choice of features depends on the type of data, and the
features themselves might not work well for some texts such as ones
collected from twitter for example.
I hope this help you.
Best regards
Mondher
On Thu, Jun 30, 2016 at 7:42 PM, Damiano Porta
wrote:
> Hi Mondher,
> could you give me a raw example to understand how i should train the
> classifier model?
>
> Thank you in advance!
> Damiano
>
>
> 2016-06-30 6:57 GMT+02:00 Mondher Bouazizi :
>
> > Hi,
> >
> > I would recommend a hybrid approach where, in a first step, you use a
> plain
> > dictionary and then perform the classification if needed.
> >
> > It's straightforward, but I think it would present better performances
> than
> > just performing a classification task.
> >
> > In the first step you use a dictionary of names along with an attribute
> > specifying whether the name fits for males, females or both. In case the
> > name fits for males or females exclusively, then no need to go any
> further.
> >
> > If the name fits for both genders, or is a family name etc., a second
> step
> > is needed where you extract features from the context (surrounding words,
> > etc.) and perform a classification task using any machine learning
> > algorithm.
> >
> > Another way would be using the information itself (whether the name fits
> > for males, females or both) as a feature when you perform the
> > classification.
> >
> > Best regards,
> >
> > Mondher
> >
> > I am not sure
> >
> > On Wed, Jun 29, 2016 at 10:27 PM, Damiano Porta
> > wrote:
> >
> > > Awesome! Thank you so much WIlliam!
> > >
> > > 2016-06-29 13:36 GMT+02:00 William Colen :
> > >
> > > > To create a NER model OpenNLP extracts features from the context,
> > things
> > > > such as: word prefix and suffix, next word, previous word, previous
> > word
> > > > prefix and suffix, next word prefix and suffix etc.
> > > > When you don't configure the feature generator it will apply the
> > default:
> > > >
> > > >