You might find this package helpful--it's specifically for NER and tweets. https://github.com/aritter/twitter_nlp
Peace. Michael On Fri, Sep 13, 2013 at 3:49 AM, Siva Sakthi <[email protected]> wrote: > Hi, > we are using opennlp for finding organizations (code below) > > e.g. > > 1. Find out how Intel Xeon processors help make #EMC number 1 in backup at > #IDF13 going on now in San Francisco. #Speed2Lead Protect your data >>> > Opennlp returns "Intel" in the above sentence > > 2. NYPD Intel Division Chief Lashes Out At FBI Over Failed Terrorist Plot > http://t.co/V0XLKrp3TI >>> > Opennlp returns "Intel Division Chief Lashes" > > Issue 1: I don't understand why it returns a composite string in the second > case, instead of just Intel > Issue 2: The "Intel" in the second sentence is not really "Intel" > > My code as follows, > > public static String findOrg(String message) throws Exception { > String[] words = message.split(" "); > InputStream orgIs = new FileInputStream("en-ner-organization.bin"); > TokenNameFinderModel tnf = new TokenNameFinderModel(orgIs); > NameFinderME nf = new NameFinderME(tnf); > Span sp[] = nf.find(words); > String a[] = Span.spansToStrings(sp, words); > StringBuilder sb = new StringBuilder(); > int l = a.length; > > for (int j = 0; j < l; j++) { > sb = sb.append(a[j] + "\n"); > } > > return sb.toString(); > } > > Thanks, > Ss
