Eldad, It is possible. 1) This is easy enough with the current architecture and models. Basically, you have to pass in the document or paragraphs and parse into sentences using the SentenceDetector, which detects the sentences in the paragraph and returns a String array of sentences. Next the output from the sentence detector needs to be put through the Tokenizer, which takes the sentences and tokenizes into smaller parts. Usually words, but it also moves punctuation away from the words as well. This is done for each sentence and returns a string list of tokens. From here you have the raw data needed for most of the other models. From your description, you will want to use the NameFinder and the supporting models to tag the people, locations, and organizations and the like.
2) Not sure what you mean by link documents to others.... 3) We don't yet support all languages at the moment. Mostly because training and test data need to be collected over many months and parsed to be trained. Many groups have already done some work; unfortunately, most is copyrighted and difficult for everyone to get in some cases. This should get you started. http://incubator.apache.org/opennlp/documentation/manual/opennlp.html Download the release here... Don't forget the models toward the bottom. http://incubator.apache.org/opennlp/download.cgi Let us know if you need anything else. James On 6/4/2011 12:30 PM, Eldad Yamin wrote: > Hello everyone, > After researching about NLP I have found the OpenNLP as one of the most > promising solution at the moment. > however, I'm still looking for instruction on how to make the OpenNLP fit to > my needs. > > I need the OpenNLP to: > 1. get as input a sentence/paragraph and in return IE, annotation, named > entities (people, locations, organizations) and (numbers, dates, etc .). > 2. to use the OpenNLP to link documents to others. > 3. to support multi languages. > > Please advise, > Eldad. >
