Re: Sentiment Analysis Parser updates
Thank you Jason! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/22/16, 8:41 PM, "Jason Baldridge" wrote: >Anastasija, > >There might be a few appropriate sentiment datasets listed in my homework >on Twitter sentiment analysis: > >https://github.com/utcompling/applied-nlp/wiki/Homework5 > >There may also be some useful data sets in the Crowdflower Open Data >collection: > >https://www.crowdflower.com/data-for-everyone/ > >Hope this helps! > >-Jason > >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < >mensikova.anastas...@gmail.com> wrote: > >> Hi everyone, >> >> Some updates on our Sentiment Analysis Parser work. >> >> You might have noticed, I have enhanced our website (the GH page) recently, >> polished it and made it more user-friendly. My next step will be sending a >> pull request to Tika. However, my main goal until the end of Google Summer >> of Code is to enhance the parser in a way that will allow it to work >> categorically (in other words, the sentiment determined won't be just >> positive or negative, it will have a few categories). This means that my >> next step is to look for a categorical open data set (which I will >> hopefully do by the end of the weekend the latest) and, of course, enhance >> my model and training. After that I will look into how the confidence >> levels can be increased. >> >> Have a great day/night! >> >> Thank you, >> Anastasija Mensikova. >>
Re: Sentiment Analysis Parser updates
Anastasija, There might be a few appropriate sentiment datasets listed in my homework on Twitter sentiment analysis: https://github.com/utcompling/applied-nlp/wiki/Homework5 There may also be some useful data sets in the Crowdflower Open Data collection: https://www.crowdflower.com/data-for-everyone/ Hope this helps! -Jason On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova < mensikova.anastas...@gmail.com> wrote: > Hi everyone, > > Some updates on our Sentiment Analysis Parser work. > > You might have noticed, I have enhanced our website (the GH page) recently, > polished it and made it more user-friendly. My next step will be sending a > pull request to Tika. However, my main goal until the end of Google Summer > of Code is to enhance the parser in a way that will allow it to work > categorically (in other words, the sentiment determined won't be just > positive or negative, it will have a few categories). This means that my > next step is to look for a categorical open data set (which I will > hopefully do by the end of the weekend the latest) and, of course, enhance > my model and training. After that I will look into how the confidence > levels can be increased. > > Have a great day/night! > > Thank you, > Anastasija Mensikova. >
Re: Sentiment Analysis Parser updates
Great work Anastasija! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ On 6/22/16, 1:55 PM, "Anastasija Mensikova" wrote: >Hi everyone, > >Some updates on our Sentiment Analysis Parser work. > >You might have noticed, I have enhanced our website (the GH page) recently, >polished it and made it more user-friendly. My next step will be sending a >pull request to Tika. However, my main goal until the end of Google Summer >of Code is to enhance the parser in a way that will allow it to work >categorically (in other words, the sentiment determined won't be just >positive or negative, it will have a few categories). This means that my >next step is to look for a categorical open data set (which I will >hopefully do by the end of the weekend the latest) and, of course, enhance >my model and training. After that I will look into how the confidence >levels can be increased. > >Have a great day/night! > >Thank you, >Anastasija Mensikova.
Sentiment Analysis Parser updates
Hi everyone, Some updates on our Sentiment Analysis Parser work. You might have noticed, I have enhanced our website (the GH page) recently, polished it and made it more user-friendly. My next step will be sending a pull request to Tika. However, my main goal until the end of Google Summer of Code is to enhance the parser in a way that will allow it to work categorically (in other words, the sentiment determined won't be just positive or negative, it will have a few categories). This means that my next step is to look for a categorical open data set (which I will hopefully do by the end of the weekend the latest) and, of course, enhance my model and training. After that I will look into how the confidence levels can be increased. Have a great day/night! Thank you, Anastasija Mensikova.
Re: Performances of OpenNLP tools
It would be nice to get MASC support into the OpenNLP formats package. Jörn On Tue, Jun 21, 2016 at 6:18 PM, Jason Baldridge wrote: > Jörn is absolutely right about that. Another good source of training data > is MASC. I've got some instructions for training models with MASC here: > > https://github.com/scalanlp/chalk/wiki/Chalk-command-line-tutorial > > Chalk (now defunct) provided a Scala wrapper around OpenNLP functionality, > so the instructions there should make it fairly straightforward to adapt > MASC data to OpenNLP. > > -Jason > > On Tue, 21 Jun 2016 at 10:46 Joern Kottmann wrote: > > > There are some research papers which study and compare the performance of > > NLP toolkits, but be careful often they don't train the NLP tools on the > > same data and the training data makes a big difference on the > performance. > > > > Jörn > > > > On Tue, Jun 21, 2016 at 5:44 PM, Joern Kottmann > > wrote: > > > > > Just don't use the very old existing models, to get good results you > have > > > to train on your own data, especially if the domain of the data used > for > > > training and the data which should be processed doesn't match. The old > > > models are trained on 90s news, those don't work well on todays news > and > > > probably much worse on tweets. > > > > > > OntoNots is a good place to start if the goal is to process news. > OpenNLP > > > comes with build-in support to train models from OntoNotes. > > > > > > Jörn > > > > > > On Tue, Jun 21, 2016 at 4:20 PM, Mattmann, Chris A (3980) < > > > chris.a.mattm...@jpl.nasa.gov> wrote: > > > > > >> This sounds like a fantastic idea. > > >> > > >> ++ > > >> Chris Mattmann, Ph.D. > > >> Chief Architect > > >> Instrument Software and Science Data Systems Section (398) > > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >> Office: 168-519, Mailstop: 168-527 > > >> Email: chris.a.mattm...@nasa.gov > > >> WWW: http://sunset.usc.edu/~mattmann/ > > >> ++ > > >> Director, Information Retrieval and Data Science Group (IRDS) > > >> Adjunct Associate Professor, Computer Science Department > > >> University of Southern California, Los Angeles, CA 90089 USA > > >> WWW: http://irds.usc.edu/ > > >> ++ > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On 6/21/16, 12:13 AM, "Anthony Beylerian" < > anthonybeyler...@hotmail.com > > > > > >> wrote: > > >> > > >> >+1 > > >> > > > >> >Maybe we could put the results of the evaluator tests for each > > component > > >> somewhere on a webpage and on every release update them. > > >> >This is of course provided there are reasonable data sets for testing > > >> each component. > > >> >What do you think? > > >> > > > >> >Anthony > > >> > > > >> >> From: mondher.bouaz...@gmail.com > > >> >> Date: Tue, 21 Jun 2016 15:59:47 +0900 > > >> >> Subject: Re: Performances of OpenNLP tools > > >> >> To: dev@opennlp.apache.org > > >> >> > > >> >> Hi, > > >> >> > > >> >> Thank you for your replies. > > >> >> > > >> >> Please Jeffrey accept once more my apologies for receiving the > email > > >> twice. > > >> >> > > >> >> I also think it would be great to have such studies on the > > >> performances of > > >> >> OpenNLP. > > >> >> > > >> >> I have been looking for this information and checked in many > places, > > >> >> including obviously google scholar, and I haven't found any serious > > >> studies > > >> >> or reliable results. Most of the existing ones report the > > performances > > >> of > > >> >> outdated releases of OpenNLP, and focus more on the execution time > or > > >> >> CPU/RAM consumption, etc. > > >> >> > > >> >> I think such a comparison will help not only evaluate the overall > > >> accuracy, > > >> >> but also highlight the issues with the existing models (as a matter > > of > > >> >> fact, the existing models fail to recognize many of the hashtags in > > >> tweets: > > >> >> the tokenizer splits them into the "#" symbol and a word that the > PoS > > >> >> tagger also fails to recognize). > > >> >> > > >> >> Therefore, building Twitter-based models would also be useful, > since > > >> many > > >> >> of the works in academia / industry are focusing on Twitter data. > > >> >> > > >> >> Best regards, > > >> >> > > >> >> Mondher > > >> >> > > >> >> > > >> >> > > >> >> On Tue, Jun 21, 2016 at 12:45 AM, Jason Baldridge < > > >> jasonbaldri...@gmail.com> > > >> >> wrote: > > >> >> > > >> >> > It would be fantastic to have these numbers. This is an example > of > > >> >> > something that would be a great contribution by someone trying to > > >> >> > contribute to open source and who is maybe just getting into > > machine > > >> >> > learning and natural language processing. > > >> >> > > > >> >> > For Twitter-ish text, it'd be great to look at models trained and > > >> evaluated > > >> >> > on the Tw