Eldad,

It is possible.
1)  This is easy enough with the current architecture and models. 
Basically, you have to pass in the document or paragraphs and parse into
sentences using the SentenceDetector, which detects the sentences in the
paragraph and returns a String array of sentences.  Next the output from
the sentence detector needs to be put through the Tokenizer, which takes
the sentences and tokenizes into smaller parts.  Usually words, but it
also moves punctuation away from the words as well.  This is done for
each sentence and returns a string list of tokens.   From here you have
the raw data needed for most of the other models.  From your
description, you will want to use the NameFinder and the supporting
models to tag the people, locations, and organizations and the like.

2)  Not sure what you mean by link documents to others....

3)  We don't yet support all languages at the moment.  Mostly because
training and test data need to be collected over many months and parsed
to be trained.  Many groups have already done some work; unfortunately,
most is copyrighted and difficult for everyone to get in some cases.

This should get you started.
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html

Download the release here...  Don't forget the models toward the bottom.
http://incubator.apache.org/opennlp/download.cgi

Let us know if you need anything else.

James

 
On 6/4/2011 12:30 PM, Eldad Yamin wrote:
> Hello everyone,
> After researching about NLP I have found the OpenNLP as one of the most
> promising solution at the moment.
> however, I'm still looking for instruction on how to make the OpenNLP fit to
> my needs.
>
> I need the OpenNLP to:
> 1. get as input a sentence/paragraph and in return IE, annotation, named
> entities (people, locations, organizations) and   (numbers, dates, etc .).
> 2. to use the OpenNLP to link documents to others.
> 3. to support multi languages.
>
> Please advise,
> Eldad.
>

Reply via email to