Re: svn commit: r1681259 - in /opennlp/trunk: opennlp-distr/pom.xml opennlp-docs/pom.xml opennlp-tools/pom.xml opennlp-uima/pom.xml pom.xml

2015-09-03 Thread Joern Kottmann
Hello,

yes the github apache/opennlp repository is always synchronized with our
subversion repository here at Apache.
If you have a look you will see recent changes in there.

Jörn

On Tue, May 26, 2015 at 6:07 AM, Ethan Wang  wrote:

> Hey folks,
>
> is g...@github.com:apache/opennlp.git still an official place for this
> project? If so is there anyone doing sync between svn and that?
>
> Thanks,
>
> Ethan
>
>
>
> > On May 22, 2015, at 9:19 PM, co...@apache.org wrote:
> >
> > Author: colen
> > Date: Sat May 23 02:19:41 2015
> > New Revision: 1681259
> >
> > URL: http://svn.apache.org/r1681259
> > Log:
> > [maven-release-plugin] prepare for next development iteration
> >
> > Modified:
> >opennlp/trunk/opennlp-distr/pom.xml
> >opennlp/trunk/opennlp-docs/pom.xml
> >opennlp/trunk/opennlp-tools/pom.xml
> >opennlp/trunk/opennlp-uima/pom.xml
> >opennlp/trunk/pom.xml
> >
> > Modified: opennlp/trunk/opennlp-distr/pom.xml
> > URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-distr/pom.xml?rev=1681259=1681258=1681259=diff
> >
> ==
> > --- opennlp/trunk/opennlp-distr/pom.xml (original)
> > +++ opennlp/trunk/opennlp-distr/pom.xml Sat May 23 02:19:41 2015
> > @@ -24,7 +24,7 @@
> >   
> >   org.apache.opennlp
> >   opennlp
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> >   ../pom.xml
> >   
> >
> > @@ -37,12 +37,12 @@
> >   
> >   org.apache.opennlp
> >   opennlp-tools
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> >   
> >   
> >   org.apache.opennlp
> >   opennlp-uima
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> >   
> >   
> >
> >
> > Modified: opennlp/trunk/opennlp-docs/pom.xml
> > URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-docs/pom.xml?rev=1681259=1681258=1681259=diff
> >
> ==
> > --- opennlp/trunk/opennlp-docs/pom.xml (original)
> > +++ opennlp/trunk/opennlp-docs/pom.xml Sat May 23 02:19:41 2015
> > @@ -24,7 +24,7 @@
> >   
> >   org.apache.opennlp
> >   opennlp
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> > ../pom.xml
> >   
> >
> >
> > Modified: opennlp/trunk/opennlp-tools/pom.xml
> > URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/pom.xml?rev=1681259=1681258=1681259=diff
> >
> ==
> > --- opennlp/trunk/opennlp-tools/pom.xml (original)
> > +++ opennlp/trunk/opennlp-tools/pom.xml Sat May 23 02:19:41 2015
> > @@ -25,7 +25,7 @@
> >   
> > org.apache.opennlp
> > opennlp
> > -1.6.0
> > +1.6.1-SNAPSHOT
> > ../pom.xml
> >   
> >
> >
> > Modified: opennlp/trunk/opennlp-uima/pom.xml
> > URL:
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-uima/pom.xml?rev=1681259=1681258=1681259=diff
> >
> ==
> > --- opennlp/trunk/opennlp-uima/pom.xml (original)
> > +++ opennlp/trunk/opennlp-uima/pom.xml Sat May 23 02:19:41 2015
> > @@ -25,7 +25,7 @@
> >   
> >   org.apache.opennlp
> >   opennlp
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> >   ../pom.xml
> > 
> >
> > @@ -46,7 +46,7 @@
> >   
> >   org.apache.opennlp
> >   opennlp-tools
> > - 1.6.0
> > + 1.6.1-SNAPSHOT
> >   
> >
> >   
> >
> > Modified: opennlp/trunk/pom.xml
> > URL:
> http://svn.apache.org/viewvc/opennlp/trunk/pom.xml?rev=1681259=1681258=1681259=diff
> >
> ==
> > Binary files - no diff available.
> >
> >
>
>


Best method to confirm an entity

2015-09-03 Thread Damiano Porta
Hello!
I would like to understand the best approach to the following problem.

I have documents really similar to resume/cv and i have to extract entities
( Name, Surname, Birthday, Cities, zipcode etc).

To extract those entities I am combining different finders:

Birthday and zipcodes = RegexNameFinder
Name, Surname and Cities = DictionaryNameFinder.

There are no problems with those finders, but, i am looking for a
method/algorithm or something like that to *confirm* the entities.

with "confirm" i mean that i have to find specific term (or entities) in
proximities (closer to the entities I have found).

Example:

My name is 
Name: 
Name and Surname: 

I can confirm the entity  because it is closer to specific term that
let me understand the "context". If i have "name" or "surname" words near
the entity  so i can say that i have found the  with a good
probability.

So the goal is write those kind of rules to confirm entities. Another
example should be:

My address is .., 00143 Rome

Italian zipcodes are 5 digits long (numeric only), it is easy to find a 5
digits number inside my document (i use regex as i wrote above), and i also
check it by quering a database to understand if the number exists. The
problem here is that i need one more check to confirm (definitely) it.

I must see if that number is near the entity , if yes, ok... i have
good probabilities.

I also tried to train a model but i do not really have a "context"
(sentences).
Training the model with:

My name is: John
Name: John
Name/Surname: John
John is my name

does not sound good to me because:
1. i have read we need many sentences to train a good model,
2. Those are not "sentences" i do not have a "context" (remember we i said
the document is similar to resume/cv)
3. Maybe those phrases are too short

I do not know how many different ways i could find to say the exact thing,
but surelly i can not find 15000 ways :)

What method should i use to try to confirm my entities?

Thank you so much!