Re: SentimentAnalysisParser updates

2016-07-01 Thread Mattmann, Chris A (3980)
No problem Jörn we’ll make it happen.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++










On 7/1/16, 12:55 AM, "Joern Kottmann"  wrote:

>Hello,
>
>would be nice to get a pull request for the work you did.
>
>Thanks,
>Jörn
>
>On Wed, Jun 29, 2016 at 8:08 PM, Anastasija Mensikova <
>mensikova.anastas...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Some updates on our SentimentAnalysisParser.
>>
>> For the past week I worked on making a pull request to Tika and on looking
>> for the right categorical open datasets to enhance my
>> SentimentAnalysisParser and make it categorical. Thanks to your help and
>> some reasearch, we have decided on using SentiWordNet and Stanford
>> Sentiment Treebank to create Facebook reaction-like categories for
>> sentiment analysis.
>>
>> My next steps will include: creating a pull request to OpenNLP, work on
>> making my parser categorical and implement AbstractEvaluatorTool and
>> AbstractCrossValidatorTool to yield some results that can be used on our
>> GH-page in the form of D3 graphs.
>>
>> Thank you for all of your help and have a great rest of the week!
>>
>> Thank you,
>> Anastasija
>>


Re: DeepLearning4J as a ML for OpenNLP

2016-07-01 Thread Joern Kottmann
Hello,

the people from deeplearning4j are rather nice and I discussed with them
for a while how
it can be used for OpenNLP. The state back then was that they don't
properly support the
sparse feature vectors we use in OpenNLP today. Instead we would need to
use word embeddings.
In the end I never tried it out but I think it might not be very difficult
to get everything wired together,
the most difficult part is probably to find a deep learning model setup
which works well.

Jörn

On Tue, Jun 28, 2016 at 11:23 PM, William Colen 
wrote:

> Hi,
>
> Do you think it would be possible to implement a ML based on DL4J?
>
> http://deeplearning4j.org/
>
> Thank you
> William
>


Re: Model to detect the gender

2016-07-01 Thread Mondher Bouazizi
Hi,

Sorry for my late reply. I didn't understand well your last email, but here
is what I meant:

Given a simple dictionary you have that has the following columns:

Name   Type   Gender
Agatha First   F
JohnFirst   M
Smith  Both   B

where:
- "First" refers to first name, "Last" (not in the example) refers to last
name, and Both means it can be both.
- "F" refers to female, "M" refers to males, and "B" refers to both genders.

and given the following two sentences:

1. "It was nice meeting you John. I hope we meet again soon."

2. "Yes, I met Mrs. Smith. I asked her her opinion about the case and felt
she knows something"

In the first example, when you check in the dictionary, the name "John" is
a male name, so no need to go any further.
However, in the second example, the name "Smith", which is a family name in
our case, can be fit for both, males and females. Therefore, we need to
extract features from the surrounding context and perform a classification
task.
Here are some of the features I think they would be interesting to use:

. Presence of a male initiative before the word {True, False}
. Presence of a female initiative before the word {True, False}

. Gender of the first personal pronoun (subject or object form) to the
right of the nameValues={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the first personal pronoun to the right (in
words) Values=NUMERIC
. Gender of the second personal pronoun to the right of the
name Values={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the second personal pronoun right
 Values=NUMERIC
. Gender of the third personal pronoun to the right of the
name  Values={MALE, FEMALE, UNCERTAIN,
EMPTY}
. Distance between the name and the third personal pronoun right (in
words)  Values=NUMERIC

. Gender of the first personal pronoun (subject or object form) to the left
of the name   Values={MALE, FEMALE, UNCERTAIN, EMPTY}
. Distance between the name and the first personal pronoun to the left (in
words)Values=NUMERIC
. Gender of the second personal pronoun to the left of the
nameValues={MALE, FEMALE, UNCERTAIN,
EMPTY}
. Distance between the name and the second personal pronoun left
Values=NUMERIC
. Gender of the third personal pronoun to the left of the
nameValues={MALE, FEMALE,
UNCERTAIN, EMPTY}
. Distance between the name and the third personal pronoun left (in
words)Values=NUMERIC

In the second example here are the values you have for your features

F1 = False
F2 = True
F3 = UNCERTAIN
F4 = 1
F5 = FEMALE
F6 = 3
F7 = FEMALE
F8 = 4
F9 = UNCERTAIN
F10 = 2
F11 = EMPTY
F12 = 0
F13 = EMPTY
F14 = 0

Of course the choice of features depends on the type of data, and the
features themselves might not work well for some texts such as ones
collected from twitter for example.

I hope this help you.

Best regards

Mondher


On Thu, Jun 30, 2016 at 7:42 PM, Damiano Porta 
wrote:

> Hi Mondher,
> could you give me a raw example to understand how i should train the
> classifier model?
>
> Thank you in advance!
> Damiano
>
>
> 2016-06-30 6:57 GMT+02:00 Mondher Bouazizi :
>
> > Hi,
> >
> > I would recommend a hybrid approach where, in a first step, you use a
> plain
> > dictionary and then perform the classification if needed.
> >
> > It's straightforward, but I think it would present better performances
> than
> > just performing a classification task.
> >
> > In the first step you use a dictionary of names along with an attribute
> > specifying whether the name fits for males, females or both. In case the
> > name fits for males or females exclusively, then no need to go any
> further.
> >
> > If the name fits for both genders, or is a family name etc., a second
> step
> > is needed where you extract features from the context (surrounding words,
> > etc.) and perform a classification task using any machine learning
> > algorithm.
> >
> > Another way would be using the information itself (whether the name fits
> > for males, females or both) as a feature when you perform the
> > classification.
> >
> > Best regards,
> >
> > Mondher
> >
> > I am not sure
> >
> > On Wed, Jun 29, 2016 at 10:27 PM, Damiano Porta 
> > wrote:
> >
> > > Awesome! Thank you so much WIlliam!
> > >
> > > 2016-06-29 13:36 GMT+02:00 William Colen :
> > >
> > > > To create a NER model OpenNLP extracts features from the context,
> > things
> > > > such as: word prefix and suffix, next word, previous word, previous
> > word
> > > > prefix and suffix, next word prefix and suffix etc.
> > > > When you don't configure the feature generator it will apply the
> > default:
> > > >
> > > >

Re: SentimentAnalysisParser updates

2016-07-01 Thread Joern Kottmann
Hello,

would be nice to get a pull request for the work you did.

Thanks,
Jörn

On Wed, Jun 29, 2016 at 8:08 PM, Anastasija Mensikova <
mensikova.anastas...@gmail.com> wrote:

> Hi everyone,
>
> Some updates on our SentimentAnalysisParser.
>
> For the past week I worked on making a pull request to Tika and on looking
> for the right categorical open datasets to enhance my
> SentimentAnalysisParser and make it categorical. Thanks to your help and
> some reasearch, we have decided on using SentiWordNet and Stanford
> Sentiment Treebank to create Facebook reaction-like categories for
> sentiment analysis.
>
> My next steps will include: creating a pull request to OpenNLP, work on
> making my parser categorical and implement AbstractEvaluatorTool and
> AbstractCrossValidatorTool to yield some results that can be used on our
> GH-page in the form of D3 graphs.
>
> Thank you for all of your help and have a great rest of the week!
>
> Thank you,
> Anastasija
>