Re: NER Features

2018-01-30 Thread Damiano Porta
P. I would suggest looking at the find() > method and align what that method does with my comments on the steps you > need to take. > > Hope it helps... > Daniel > > > On Jan 30, 2018, at 12:10 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > > He

NER Features

2018-01-30 Thread Damiano Porta
Hello everybody, how can we understand what are the most important features during the NER process? I mean.. when the TokenNameFinder selects a label is it possible to retrieve the most important features too ? Thanks Damiano

Re: Spelling correction

2017-07-01 Thread Damiano Porta
t to have this feature in OpenNLP. > > I am not aware of any papers on this, but the first thing that comes to > mind and is irrelevant is the 'Noisy channel'. > > > > On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello ev

Spelling correction

2017-07-01 Thread Damiano Porta
Hello everybody, i am dealing with data normalization on very bad sentences with many spelling errors. Do you know a good paper to understand how to build a model that will fix this kind of problem? I can share the code without problems if you are interested in integrating it into OpenNLP.

Re: Missing serializer for postagger.bin

2017-06-09 Thread Damiano Porta
s.xml file: ** ** ** ** ** ** ** ** ** * * ** ** ** * * ** ** ** * * ** ** ** 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > Jorn, > At the moment i am using the command tool to train my ner mod

Re: Missing serializer for postagger.bin

2017-06-09 Thread Damiano Porta
ror):* > > > *Indexing events using cutoff of 0 Computing event counts... done. 30 > events Indexing... done.Collecting events... Done indexing.Incorporating > indexed data for training... done. Number of Event Tokens: 30 Number of > Outcomes: 2 Number of Predicates: 144Comp

Re: Missing serializer for postagger.bin

2017-06-07 Thread Damiano Porta
ccuracy less than 1.0E-5Stats: (30/30) 1.0...done.Compressed 144 parameters to 621 outcome patternsjava.lang.IllegalStateException: Missing serializer for postagger.bin at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at com.damiano.trainer.Test.(Test.java:75) at com.damiano.

CrossValidator with folds=1 gives me F-Measure: 0.116

2017-06-06 Thread Damiano Porta
Hello, I am getting very strange results with *TokenNameFinderCrossValidator* API. My generators.xml is: CODE: *try

AdditionalContextFeatureGenerator

2017-05-31 Thread Damiano Porta
Hello, can we not use the generator AdditionalContextFeatureGenerator for training? I do not see the *ne=* feature during the training... only the generators inside my xml are able to add features. How can i see if this custom context is begin used? I pass the context in the NameSample:

Re: Stemmer Feature Generator

2017-05-26 Thread Damiano Porta
Jorn, what is the current performace with CONLL 2003? 2017-05-26 17:43 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > Hello, > > can you post performance numbers? Only if it helps with some data set it > would make sense to add it. > > Jörn > > On Thu, May 25,

Stemmer Feature Generator

2017-05-25 Thread Damiano Porta
Hello, do you think a StemmerFeatureGenerator can be useful for NER models? I can create a PR for it. Damiano

Re: CoReference

2017-05-19 Thread Damiano Porta
urrently, if you would like to work on it you > are more than welcome to get this back into opennlp-tools. > > Jörn > > On Thu, May 18, 2017 at 4:37 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Do you also have an example? :) > > > > Il 18 mag 201

Re: CoReference

2017-05-18 Thread Damiano Porta
Do you also have an example? :) Il 18 mag 2017 16:35, "Damiano Porta" <damianopo...@gmail.com> ha scritto: > Oh my wrong. Pardon. > Do we have accuracy statistics? > > Il 18 mag 2017 14:59, "Joern Kottmann" <kottm...@gmail.com> ha scritto: > >&

Re: CoReference

2017-05-18 Thread Damiano Porta
> > > > Here's the issue in question https://issues.apache.org/ > > > jira/browse/OPENNLP-48 and here's where I believe the code is now > located > > > https://svn.apache.org/repos/asf/opennlp/sandbox/opennlp-coref/ > > > > > > Not sure if

Re: CoReference

2017-05-18 Thread Damiano Porta
-coref/ > > Not sure if there was any other work not mentioned in that issue. > > Hope that helps > Bruno > > From: Damiano Porta <damianopo...@gmail.com> > To: dev@opennlp.apache.org > Sent: Thursday, 18 May 2017 10:54 PM > Subjec

CoReference

2017-05-18 Thread Damiano Porta
Hello everybody, i need a coreference solution to link my entities (DATE, PERSON, ORG). Can someone show me the way to start working on that? Thank you so much. Damiano

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
an issue? 2017-03-06 13:43 GMT+01:00 Damiano Porta <damianopo...@gmail.com>: > Oh I see. Thanks! > > Basically i have 30k sentences i apply the labels with a script and then i > pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate > it. > > I am trying

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
aining and 1 for testing, this is repeated n times, so that each > partition was once used for testing. > > It really should be three times as long in your case, maybe there is > something else wrong?' > > Jörn > > On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta <damianopo.

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
near to the > iterations, if you use 300 instead of 100 it should take three times as > long. > > Jörn > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Jorn, > > I am training and testing the model via api. If it is no

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
command line? Which command? > > Jörn > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello Jorn, > > I tried with 300 iterations and it takes forever, reducing that number to > > 100 i can finally get the model in ha

Re: CUDA

2017-03-06 Thread Damiano Porta
. At some point we probably add support for one of > the deep learning packages and those usually use CUDA. > > Jörn > > On Sat, Mar 4, 2017 at 5:17 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello everybody, > > > > does OpenNLP support CUDA parallel computing? > > > > Damiano > > >

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
it is doing. Damiano 2017-03-06 10:19 GMT+01:00 Joern Kottmann <kottm...@gmail.com>: > Hello, > > this looks like output from the cross validator. > > Jörn > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > >

Training perceptron model

2017-03-05 Thread Damiano Porta
Hello, I am training a NER model with perceptron classifier (using OpenNLP 1.7.0) the output of the training is: Indexing events using cutoff of 0 Computing event counts... done. 11861603 events Indexing... done. Collecting events... Done indexing. Incorporating indexed data for training...

CUDA

2017-03-04 Thread Damiano Porta
Hello everybody, does OpenNLP support CUDA parallel computing? Damiano

BUG in NameSample

2017-03-03 Thread Damiano Porta
Hello everybody, I think i found a bug in NameSample. This is the use case: String[] tokens = new String[] { "0", "1", "2", "3", "4", ",", "6", "7", "8" }; Span[] spans = new Span[] { new Span(7,8, "zipcode"), new Span(1,7, "address"), }; NameSample n = new NameSample(tokens, spans, true);

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
the date with the NE class you will be fine. > As long as in testing time you use the same tokenization. > > Cheers, > > R > > On Thu, Mar 2, 2017 at 11:24 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hi Rodrigo, thanks for your message. > &

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
ll all depend on the how the tokenizer will do it and how it is > annotated in the training data. In any case, the most important thing is > for the tokenization to be consistent for training and testing. > > HTH, > > Rodrigo > > ... > > On Thu, Mar 2, 2017 at 5:46

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
catch things like “call me at 3011234567.” even though > your regex wont match (if you look at the previous 4 words to catch “call > me”). > > > Daniel > > On 3/2/17, 12:24 PM, "Damiano Porta" <damianopo...@gmail.com> wrote: > > Hello Daniel, ye

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
ng the white space with > printable (though possible not an alphanumeric character like an > underscore)? > Daniel > > On 3/2/17, 11:46 AM, "Damiano Porta" <damianopo...@gmail.com> wrote: > > Hello everybody, > > i have created a custom tokenizer that does n

Tokenizer for NER training

2017-03-02 Thread Damiano Porta
Hello everybody, i have created a custom tokenizer that does not split specific "patterns" like, emails, telephones, dates etc. I convert them into ONE single token. The other parts of text are tokenized with the SimpleTokenizer. The problem is when i need to train a NER model. For example if my

Re: Name Finder trainer default settings

2017-02-07 Thread Damiano Porta
I have good results with perceptron, but +1 for CRF 2017-02-07 15:42 GMT+01:00 Russ, Daniel (NIH/CIT) [E] : > Hi Jörn, > > > >I think the best entity recognition systems use CRF’s. At some point > we might want to consider adding them. As you know, ME classifiers suffer

Re: word2vec

2017-01-25 Thread Damiano Porta
Aha yeah it helped me to understand the input and output formats. ok i will try to create clusters using the official tool. Thanks! Damiano Il 25/Gen/2017 21:54, "Rodrigo Agerri" <rage...@apache.org> ha scritto: It might, I forgot that :) R On Wed, Jan 25, 2017 at 9:43 P

Re: word2vec

2017-01-25 Thread Damiano Porta
ring algorithm and then pass it as explained > in the manual: > > http://opennlp.apache.org/documentation/1.7.1/manual/ > opennlp.html#tools.namefind.training > > HTH, > > R > > On Wed, Jan 25, 2017 at 8:09 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > >

Adding entries to resources without re-training the TokenNameFinderModel

2017-01-25 Thread Damiano Porta
Hello everybody, I am using the NameFinder tool with a custom TokenNameFinderModel model. I built this model using many DictionaryFeatureGenerators that call dictionaries i have loaded during the training. TokenNameFinderFactory factory = new TokenNameFinderFactory( IOUtils.toByteArray(in),

Re: Suggestion/Query - Adding weights to words in Document Classifier

2017-01-18 Thread Damiano Porta
Manoj, you can add custom feature using a generator that implements this: https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java take a look at

Sentence's outcomes

2017-01-14 Thread Damiano Porta
Hello, using the find() of NameFinderME i get a Span[], is it possible to get the list of outcomes inside a String[] with BIO codec? Thanks Damiano

Re: Get Original text

2016-12-19 Thread Damiano Porta
> > On Sat, Dec 17, 2016 at 1:13 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello, > > is it possible to get/pass the original text inside a Custom NER Feature > > Generator somehow? > > > > Thanks > > Damiano > > >

Get Original text

2016-12-16 Thread Damiano Porta
Hello, is it possible to get/pass the original text inside a Custom NER Feature Generator somehow? Thanks Damiano

LemmatizerME via Maven

2016-12-04 Thread Damiano Porta
Hello, there is not LemmatizerME class in OpenNLP 1.6.0 ( https://github.com/apache/opennlp/blob/trunk/opennlp-tools/src/main/java/opennlp/tools/lemmatizer/LemmatizerME.java ) I have this dependency: org.apache.opennlp opennlp-tools 1.6.0

Re: EntityLinker example

2016-11-26 Thread Damiano Porta
t it up, but just looking at > the code will probably make sense. The GeoEntityLinker is in the ADDONS > repo. > > On Sat, Nov 26, 2016 at 5:51 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello, > > do you have an example or a test to see how the EntityLinker works? > > Thanks > > > > Damiano > > >

Re: Why can i not serialize a Dictionary ?

2016-10-31 Thread Damiano Porta
k the API supports this. You will need a hack. > > 2016-10-30 12:59 GMT-02:00 Damiano Porta <damianopo...@gmail.com>: > > > Jorn > > what suffix should i use if i need a postagger model in a > FeatureGenerator? > > > > For dictionary i use mydictionary.dicti

Re: Why can i not serialize a Dictionary ?

2016-10-30 Thread Damiano Porta
Jorn what suffix should i use if i need a postagger model in a FeatureGenerator? For dictionary i use mydictionary.dictionary as you told me. What about postagger .bin? Thanks Damiano Il 29/Ott/2016 14:27, "Damiano Porta" <damianopo...@gmail.com> ha scritto: > ok! thank yo

Re: Why can i not serialize a Dictionary ?

2016-10-29 Thread Damiano Porta
ok! thank you Jorn! 2016-10-29 13:54 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > The class has to be on your classpath otherwise it can't be loaded. > > Jörn > > On Fri, 2016-10-28 at 22:59 +0200, Damiano Porta wrote: > > Jorn, > > as I wrote i have crea

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
he my xml descriptor: Damiano 2016-10-28 14:00 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > Pardon, my wrong, i forgot to change dict="damiano"/> into dict="damiano.dictionary&q

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
Pardon, my wrong, i forgot to change into in my train.xml now it is working well! and the .bin has my dictionary too 2016-10-28 13:51 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > Jorn > i change the code as you told me, this exactly: https://gist.github.com/

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
) at opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:153) ... 4 more 2016-10-28 12:55 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > Try to rename the dictionary key to xyz.dictionary then the serializer will > be mapped correctly. > > Jörn > > On Thu, Oct 27, 2016 at 11:14 PM,

Custom resources on NER model

2016-10-28 Thread Damiano Porta
Hello, could someone explain how to add a dictionary resource during the train of a NER model? At the moment i add a map of resources doing: try (InputStream modelIn = new FileInputStream("/home/damiano/fake.xml")) { Dictionary dictionary = new Dictionary(modelIn); map.put("damiano",

Re: Why can i not serialize a Dictionary ?

2016-10-27 Thread Damiano Porta
do not have other info. Do i have to create a custom Serializer too? 2016-10-27 22:04 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > On Thu, 2016-10-27 at 21:18 +0200, Joern Kottmann wrote: > > On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote: > > > > >

Why can i not serialize a Dictionary ?

2016-10-25 Thread Damiano Porta
Hello, i am getting a strange error during the compiling of a NER model. Basically, the end of the build output is: 98: ... loglikelihood=-13340.018762351776 0.999005934601099 99: ... loglikelihood=-13258.358751926637 0.9990120681028991 100: ... loglikelihood=-13178.039964721707

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
ou need) via attributes on the custom element in the > descriptor. > > This is optional if you don't have any parameters, you don't need to pass > anything at all. > > Jörn > > > On Tue, Oct 25, 2016 at 2:00 PM, Damiano Porta <damianopo...@gmail.com> >

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
sed in only if you extend CustomFeatureGenerator. > That one has an init method which gives you the attributes defined in the > xml descriptor. > > HTH, > Jörn > > On Tue, Oct 25, 2016 at 12:43 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Joern, > >

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
); System.exit(1); } It is obviously a test to understand if my generator is called. 2016-10-25 12:23 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > What is the constructor of the > com.damiano.parser.generator.SpanFeatureGenerator > class? > > Jörn > > On Tue, Oct 25

Custom Features Generator example

2016-10-25 Thread Damiano Porta
Hello, I have created a custom generator implementing the AdaptiveFeatureGenerator interface. I am getting this error: Exception in thread "main" opennlp.tools.util.ext.ExtensionNotLoadedException: java.lang.InstantiationException: com.damiano.parser.generator.SpanFeatureGenerator at

Re: Document categorization

2016-09-24 Thread Damiano Porta
ty of practitioners (the first mailing list in > https://opennlp.apache.org/mail-lists.html). > > Cohan > > > On Sat, Sep 24, 2016 at 7:12 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hello, > > we need to categorize our documents in 80 secto

Document categorization

2016-09-24 Thread Damiano Porta
Hello, we need to categorize our documents in 80 sectors. These documents are resumes/cv. We have many documents (more than 30k) but there is a problem. Should we try to extract the job positions inside each resume and categorize them or can we just add the entire document and categorize it in

Re: How to train a Tokenizer for emails ?

2016-09-10 Thread Damiano Porta
ok, thanks! 2016-09-10 23:46 GMT+02:00 William Colen <william.co...@gmail.com>: > When I need I debug the code. I don't know if there is a better way. > > > 2016-09-10 18:24 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hi WIlliam! > > Yeah i

Re: How to train a Tokenizer for emails ?

2016-09-10 Thread Damiano Porta
the token looks like an email or telephone. > > > Regards > William > > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta < > damianopo...@gmail.com> escreveu: > > > Hello, > > I am creating a custom tokenizer. It works pretty w

How to train a Tokenizer for emails ?

2016-08-29 Thread Damiano Porta
Hello, I am creating a custom tokenizer. It works pretty well but i have problems with emails. The emails can have _ - . that are tokenized in normal text, so the question is, how can i train it better? After the tokenization I need to apply different regexes to extract email/dates/telephones so i

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
> Daniel Russ, Ph.D. > Staff Scientist, Office of Intramural Research > Center for Information Technology > National Institutes of Health > U.S. Department of Health and Human Services > 12 South Drive > Bethesda, MD 20892-5624 > > On Aug 26, 2016, at 1:46 PM, Damiano Porta &l

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
0892-5624 On Aug 26, 2016, at 12:15 PM, Damiano Porta <damianopo...@gmail.com> wrote: Thanks Joern! If i have understood you correctly ... IF i do not need relation between sentences i can skip the sentences detection right? Il 26/Ago/2016 16:33, "Joern Kottmann" <kottm...@gmai

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
ire document. > > Jörn > > On Fri, Aug 26, 2016 at 3:25 PM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > Hi! > > Yes I can train a good model (sure It will takes a lot of time), i have > 30k > > resumes. So the "data" isnt a problem. >

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
that help? > Daniel > > > Daniel Russ, Ph.D. > Staff Scientist, Office of Intramural Research > Center for Information Technology > National Institutes of Health > U.S. Department of Health and Human Services > 12 South Drive > Bethesda, MD 20892-5624 > > On Aug 25,

Is sentence detection process really needed?

2016-08-25 Thread Damiano Porta
Hello everybody! Could someone explain why should I separate each sentence of my documents to train my models? My documents are like resume/cv and the sentences can be very different. For example a sentence could also be : 1. Name: John 2. Surname: travolta Etc etc So my question is. What is

Re: Generators

2016-08-17 Thread Damiano Porta
ionaryNameFinder. > > http://opennlp.apache.org/documentation/1.6.0/apidocs/ > opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html > > Regards > William > > 2016-08-16 15:50 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hello, > > >

Generators

2016-08-16 Thread Damiano Porta
Hello, pardon guys for all these questions but i am trying to study OpenNLP deeply. I write a simple code, you can see it here: https://issues.apache.org/jira/browse/OPENNLP-859?jql=project%20%3D%20OPENNLP I am trying to understand what the generators are and what is their job. I know they add

RegexNameFinderFactory with SimpleTokenizer

2016-08-16 Thread Damiano Porta
Hello, After person, addresses etc I also need to extract email/telephone from my documents, i just found https://github.com/apache/opennlp/blob/cac4db6d3cb74ae3414fc8c438eec770af783538/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinderFactory.java Reading the code it seems to be

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
ot; version of it) 2016-08-12 16:51 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > Ok thank you so much guys! > > 2016-08-12 16:43 GMT+02:00 William Colen <william.co...@gmail.com>: > >> You need to train with a corpus that is as close as possible as your &

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
n entity is too often. Like, there is > an entity in the middle of every window. > > > 2016-08-12 11:35 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Ok, but why not just ignore all the others tokens? i mean... when i > write 2 > > TOKENS + ENTI

Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
Hello everyone, pardon for the stupid question but i really do not get the point about training a maxent model with complete sentences. For example: Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 . it has ~20 tokens. As described here:

Re: Reg. MaxENT and GIS.

2016-07-05 Thread Damiano Porta
Hi William, we need to update the link, it is pointing to a wrong page. It returns Not Found. 2016-07-05 13:19 GMT+02:00 William Colen : > It is not that easy. You could start from "Papers implemented by OpenNLP": > >

Re: Model to detect the gender

2016-07-04 Thread Damiano Porta
onent. > > Jörn > > On Mon, Jul 4, 2016 at 2:41 PM, Joern Kottmann <kottm...@gmail.com> wrote: > > > I was speaking about the second case. We could build a dedicated > component > > specialized in extracting properties about already detected entities. > > >

Re: Model to detect the gender

2016-06-30 Thread Damiano Porta
words, > etc.) and perform a classification task using any machine learning > algorithm. > > Another way would be using the information itself (whether the name fits > for males, females or both) as a feature when you perform the > classification. > > Best regards, > >

Re: Model to detect the gender

2016-06-29 Thread Damiano Porta
otes of its effectiveness. Than > change/add a feature, evaluate and take notes. Sometimes a feature that we > are sure would help can destroy the model effectiveness. > > Regards > William > > > 2016-06-29 7:00 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > &

Re: Model to detect the gender

2016-06-29 Thread Damiano Porta
l. I was only thinking how to implement a gender ML model > that uses the surrounding context. > > Hope I could clarify. > > William > > 2016-06-28 19:15 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hi William, > > Ok, so you are talking about a ki

Re: Model to detect the gender

2016-06-28 Thread Damiano Porta
ow it improves. > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > Hello everybody, > > > > we built a NER model to find persons (name) inside our documents. > > We are looking for the best approach to understand if the name is &

Re: Surronding tokens of the entity on MaxEnt models

2016-05-01 Thread Damiano Porta
g/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html > > > On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <damianopo...@gmail.com> > wrote: > > > > Hello everybody > > How many surrounding tokens are kept into acco

Re: How to handle big dictionaries to find typos

2015-09-14 Thread Damiano Porta
n replace DictionaryNameFinder with a Lucene > index. When you mentioned DictionaryNameFinder I was thinking at Name > entity recognition module (tagging being done using a NER model). > > Sorry for this misunderstanding. > > BR, > Catalin > > > On 09/14/2015 03:31 P

How to handle big dictionaries to find typos

2015-09-13 Thread Damiano Porta
Hello, I have created a very big dictionary of companies, it is around 3M. At the moment i am using DictionaryNameFinder class, but I need to implement something to find typos like Gogle/Gooogle Inc etc. I read something about leveinstain distance, is this implementend in OpenNLP? It seems good

Best method to confirm an entity

2015-09-03 Thread Damiano Porta
Hello! I would like to understand the best approach to the following problem. I have documents really similar to resume/cv and i have to extract entities ( Name, Surname, Birthday, Cities, zipcode etc). To extract those entities I am combining different finders: Birthday and zipcodes =