Cool! This is an Parts-of-Speech toolkit for twitter:
http://www.ark.cs.cmu.edu/TweetNLP/
It's great that there is an NLP ecosystem developing around this new
"grammar". Are there Twitter monitoring services which use this type of
tool to fine-tune relevance? That would be a cool and resume-enhancing
technical report.
Lance
On 09/20/2013 10:59 AM, Michael Schmitz wrote:
You might find this package helpful--it's specifically for NER and tweets.
https://github.com/aritter/twitter_nlp
Peace. Michael
On Fri, Sep 13, 2013 at 3:49 AM, Siva Sakthi <[email protected]> wrote:
Hi,
we are using opennlp for finding organizations (code below)
e.g.
1. Find out how Intel Xeon processors help make #EMC number 1 in backup at
#IDF13 going on now in San Francisco. #Speed2Lead Protect your data
Opennlp returns "Intel" in the above sentence
2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot
http://t.co/V0XLKrp3TI
Opennlp returns "Intel Division Chief Lashes"
Issue 1: I don't understand why it returns a composite string in the second
case, instead of just Intel
Issue 2: The "Intel" in the second sentence is not really "Intel"
My code as follows,
public static String findOrg(String message) throws Exception {
String[] words = message.split(" ");
InputStream orgIs = new FileInputStream("en-ner-organization.bin");
TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs);
NameFinderME nf = new NameFinderME(tnf);
Span sp[] = nf.find(words);
String a[] = Span.spansToStrings(sp, words);
StringBuilder sb = new StringBuilder();
int l = a.length;
for (int j = 0; j < l; j++) {
sb = sb.append(a[j] + "\n");
}
return sb.toString();
}
Thanks,
Ss