Best method to confirm an entity

2015-09-03 Thread Damiano Porta
Hello! I would like to understand the best approach to the following problem. I have documents really similar to resume/cv and i have to extract entities ( Name, Surname, Birthday, Cities, zipcode etc). To extract those entities I am combining different finders: Birthday and zipcodes = RegexName

How to handle big dictionaries to find typos

2015-09-13 Thread Damiano Porta
Hello, I have created a very big dictionary of companies, it is around 3M. At the moment i am using DictionaryNameFinder class, but I need to implement something to find typos like Gogle/Gooogle Inc etc. I read something about leveinstain distance, is this implementend in OpenNLP? It seems good bu

Re: How to handle big dictionaries to find typos

2015-09-13 Thread Damiano Porta
Hi Catalin, Can i use it with DictionaryNameFinder? Thanks Damiano Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu ha scritto: > Hi Damiano, > > You may try Lucene fuzzy query which is based on Levenstein distance. > > BR, > Catalin > > On 09/13/2015 09:59 PM, Damian

Re: How to handle big dictionaries to find typos

2015-09-14 Thread Damiano Porta
ect orm > in your DictionaryNameFinder. > > Please let me know if it seems feasible. > > BR, > Catalin > > > > On 09/13/2015 10:35 PM, Damiano Porta wrote: > >> Hi Catalin, >> Can i use it with DictionaryNameFinder? >> Thanks >> Damiano

Re: How to handle big dictionaries to find typos

2015-09-14 Thread Damiano Porta
h a Lucene > index. When you mentioned DictionaryNameFinder I was thinking at Name > entity recognition module (tagging being done using a NER model). > > Sorry for this misunderstanding. > > BR, > Catalin > > > On 09/14/2015 03:31 PM, Damiano Porta wrote: > >&g

Grammars

2015-12-13 Thread Damiano Porta
Hello! Is there a grammar(pattern engine) like https://gate.ac.uk/sale/tao/splitch8.html#chap:jape for OPENNLP ? Thank you!

TokensRegex

2015-12-28 Thread Damiano Porta
Hello, is there a tool like http://nlp.stanford.edu/software/tokensregex.shtml in OpenNLP? Thanks Damiano

Re: TokensRegex

2016-01-04 Thread Damiano Porta
Thanks Michael! 2016-01-04 17:39 GMT+01:00 Michael Schmitz : > You could use https://github.com/knowitall/openregex or > https://github.com/knowitall/openregex-scala. They are toolkit-neutral. > > Peace. Michael > > On Mon, Dec 28, 2015 at 3:56 AM, Damiano Porta > wrote

Surronding tokens of the entity on MaxEnt models

2016-05-01 Thread Damiano Porta
Hello everybody How many surrounding tokens are kept into account to find the entity using a maxent model? Basically a maxent model should detect an entity looking at the surronding tokens, right ? I would like to understand if: 1. can i set the number of tokens on the left side? 2. can i set the

Re: Surronding tokens of the entity on MaxEnt models

2016-05-01 Thread Damiano Porta
docs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html > > > On Sun, May 1, 2016 at 5:16 AM, Damiano Porta > wrote: > > > > Hello everybody > > How many surrounding tokens are kept into account to find the entity > using > > a maxen

Re: Surronding tokens of the entity on MaxEnt models

2016-05-02 Thread Damiano Porta
eaturegen/AdaptiveFeatureGenerator.html > [2] > > https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html > > > On Sun, May 1, 2016 at 12:02 PM, Damiano Porta > wrote: > > Hi Jeff! > Thank you so

Model to detect the gender

2016-06-28 Thread Damiano Porta
Hello everybody, we built a NER model to find persons (name) inside our documents. We are looking for the best approach to understand if the name is male/female. Possible solutions: - Plain dictionary? - Regex to check the initial and/letters of the name? - Classifier? (naive bayes? Maxent?) Tha

Re: Model to detect the gender

2016-06-28 Thread Damiano Porta
2016-06-28 18:56 GMT-03:00 Damiano Porta : > > > Hello everybody, > > > > we built a NER model to find persons (name) inside our documents. > > We are looking for the best approach to understand if the name is > > male/female. > > > > Possible solutions: >

Re: Model to detect the gender

2016-06-29 Thread Damiano Porta
> Do you plan to use the surrounding context? If yes, maybe you could try > > to > > > split NER in two categories: PersonM and PersonF. Just an idea, never > > read > > > or tried anything like it. You would need a training corpus with these > > > classes.

Re: Model to detect the gender

2016-06-29 Thread Damiano Porta
n > change/add a feature, evaluate and take notes. Sometimes a feature that we > are sure would help can destroy the model effectiveness. > > Regards > William > > > 2016-06-29 7:00 GMT-03:00 Damiano Porta : > > > Thank you William! Really appreciated! > > >

Re: Model to detect the gender

2016-06-30 Thread Damiano Porta
m a classification task using any machine learning > algorithm. > > Another way would be using the information itself (whether the name fits > for males, females or both) as a feature when you perform the > classification. > > Best regards, > > Mondher > > I am n

Re: Model to detect the gender

2016-07-04 Thread Damiano Porta
features > > F1 = False > F2 = True > F3 = UNCERTAIN > F4 = 1 > F5 = FEMALE > F6 = 3 > F7 = FEMALE > F8 = 4 > F9 = UNCERTAIN > F10 = 2 > F11 = EMPTY > F12 = 0 > F13 = EMPTY > F14 = 0 > > Of course the choice of features depends on the type of data

Re: Model to detect the gender

2016-07-04 Thread Damiano Porta
un left (in > > words)Values=NUMERIC > > > > In the second example here are the values you have for your features > > > > F1 = False > > F2 = True > > F3 = UNCERTAIN > > F4 = 1 > > F5 = FEMALE > > F6 = 3 > > F7 = FEMALE

Re: Model to detect the gender

2016-07-04 Thread Damiano Porta
n Mon, Jul 4, 2016 at 2:41 PM, Joern Kottmann wrote: > > > I was speaking about the second case. We could build a dedicated > component > > specialized in extracting properties about already detected entities. > > > > Jörn > > > > On Mon, Jul 4, 2016 at

Re: Reg. MaxENT and GIS.

2016-07-05 Thread Damiano Porta
Hi William, we need to update the link, it is pointing to a wrong page. It returns Not Found. 2016-07-05 13:19 GMT+02:00 William Colen : > It is not that easy. You could start from "Papers implemented by OpenNLP": > > https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers > > I believe th

Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
Hello everyone, pardon for the stupid question but i really do not get the point about training a maxent model with complete sentences. For example: Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 . it has ~20 tokens. As described here: https://opennlp.apa

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
be ignored. 2016-08-12 16:26 GMT+02:00 William Colen : > You also need examples of what is not entities. > > > 2016-08-12 11:21 GMT-03:00 Damiano Porta : > > > Hello everyone, > > pardon for the stupid question but i really do not get the point about > > trai

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
e is > an entity in the middle of every window. > > > 2016-08-12 11:35 GMT-03:00 Damiano Porta : > > > Ok, but why not just ignore all the others tokens? i mean... when i > write 2 > > TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with > this >

Re: Why are you using complete sentences to train a model?

2016-08-12 Thread Damiano Porta
ot; version of it) 2016-08-12 16:51 GMT+02:00 Damiano Porta : > Ok thank you so much guys! > > 2016-08-12 16:43 GMT+02:00 William Colen : > >> You need to train with a corpus that is as close as possible as your >> runtime corpus. If your runtime corpus is like that I thi

RegexNameFinderFactory with SimpleTokenizer

2016-08-16 Thread Damiano Porta
Hello, After person, addresses etc I also need to extract email/telephone from my documents, i just found https://github.com/apache/opennlp/blob/cac4db6d3cb74ae3414fc8c438eec770af783538/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinderFactory.java Reading the code it seems to be

Generators

2016-08-16 Thread Damiano Porta
Hello, pardon guys for all these questions but i am trying to study OpenNLP deeply. I write a simple code, you can see it here: https://issues.apache.org/jira/browse/OPENNLP-859?jql=project%20%3D%20OPENNLP I am trying to understand what the generators are and what is their job. I know they add fea

Re: Generators

2016-08-17 Thread Damiano Porta
p://opennlp.apache.org/documentation/1.6.0/apidocs/ > opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html > > Regards > William > > 2016-08-16 15:50 GMT-03:00 Damiano Porta : > > > Hello, > > > > pardon guys for all these questions bu

Is sentence detection process really needed?

2016-08-25 Thread Damiano Porta
Hello everybody! Could someone explain why should I separate each sentence of my documents to train my models? My documents are like resume/cv and the sentences can be very different. For example a sentence could also be : 1. Name: John 2. Surname: travolta Etc etc So my question is. What is the

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
; > > Daniel Russ, Ph.D. > Staff Scientist, Office of Intramural Research > Center for Information Technology > National Institutes of Health > U.S. Department of Health and Human Services > 12 South Drive > Bethesda, MD 20892-5624 > > On Aug 25, 2016, at 9:55 AM, Damian

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
Drive Bethesda, MD 20892-5624 On Aug 26, 2016, at 5:57 AM, Damiano Porta > wrote: Hi Daniel! Thank you so much for your opinion. It makes perfectly sense. But i am still a bit confused about the length of the sentences. In a resume there are many names, dates etc etc. So my doubt is regardi

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
Jörn > > On Fri, Aug 26, 2016 at 3:25 PM, Damiano Porta > wrote: > > > Hi! > > Yes I can train a good model (sure It will takes a lot of time), i have > 30k > > resumes. So the "data" isnt a problem. > > I thought about many things, i am also creati

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
t pass in the entire document. Jörn On Fri, Aug 26, 2016 at 3:25 PM, Damiano Porta mailto:damianopo...@gmail.com>> wrote: Hi! Yes I can train a good model (sure It will takes a lot of time), i have 30k resumes. So the "data" isnt a problem. I thought about many things, i am al

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
Staff Scientist, Office of Intramural Research > Center for Information Technology > National Institutes of Health > U.S. Department of Health and Human Services > 12 South Drive > Bethesda, MD 20892-5624 > > On Aug 26, 2016, at 1:46 PM, Damiano Porta damianopo...@gmail.com>>

Re: Is sentence detection process really needed?

2016-08-26 Thread Damiano Porta
Pardon i meant the "my" word ... Il 26/Ago/2016 20:49, "Damiano Porta" ha scritto: > But i think It is the same no? I Mean. ..I will pass all the content as > one sentence. So in this case the "the" word will be tagged the same. > > The problem in t

How to train a Tokenizer for emails ?

2016-08-29 Thread Damiano Porta
Hello, I am creating a custom tokenizer. It works pretty well but i have problems with emails. The emails can have _ - . that are tokenized in normal text, so the question is, how can i train it better? After the tokenization I need to apply different regexes to extract email/dates/telephones so i

Re: How to train a Tokenizer for emails ?

2016-09-10 Thread Damiano Porta
n email or telephone. > > > Regards > William > > > Em segunda-feira, 29 de agosto de 2016, Damiano Porta < > damianopo...@gmail.com> escreveu: > > > Hello, > > I am creating a custom tokenizer. It works pretty well but i have > problems > > with e

Re: How to train a Tokenizer for emails ?

2016-09-10 Thread Damiano Porta
ok, thanks! 2016-09-10 23:46 GMT+02:00 William Colen : > When I need I debug the code. I don't know if there is a better way. > > > 2016-09-10 18:24 GMT-03:00 Damiano Porta : > > > Hi WIlliam! > > Yeah i will go with custom generator that add specific feat

Document categorization

2016-09-24 Thread Damiano Porta
Hello, we need to categorize our documents in 80 sectors. These documents are resumes/cv. We have many documents (more than 30k) but there is a problem. Should we try to extract the job positions inside each resume and categorize them or can we just add the entire document and categorize it in one

Re: Document categorization

2016-09-24 Thread Damiano Porta
unity of practitioners (the first mailing list in > https://opennlp.apache.org/mail-lists.html). > > Cohan > > > On Sat, Sep 24, 2016 at 7:12 PM, Damiano Porta > wrote: > > > Hello, > > we need to categorize our documents in 80 sectors. These documents are > > resumes/cv.

Custom Features Generator example

2016-10-25 Thread Damiano Porta
Hello, I have created a custom generator implementing the AdaptiveFeatureGenerator interface. I am getting this error: Exception in thread "main" opennlp.tools.util.ext.ExtensionNotLoadedException: java.lang.InstantiationException: com.damiano.parser.generator.SpanFeatureGenerator at opennlp.tool

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
); System.exit(1); } It is obviously a test to understand if my generator is called. 2016-10-25 12:23 GMT+02:00 Joern Kottmann : > What is the constructor of the > com.damiano.parser.generator.SpanFeatureGenerator > class? > > Jörn > > On Tue, Oct 25, 2016 at 11:51 AM, D

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
Joern, However i also tried with: public SpanFeatureGenerator(Map properties, FeatureGeneratorResourceProvider resourceProvider) throws InvalidFormatException { } but i get the same exception. Damiano 2016-10-25 12:30 GMT+02:00 Damiano Porta : > This at the moment: > &g

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
reGenerator. > That one has an init method which gives you the attributes defined in the > xml descriptor. > > HTH, > Jörn > > On Tue, Oct 25, 2016 at 12:43 PM, Damiano Porta > wrote: > > > Joern, > > However i also tried wi

Re: Custom Features Generator example

2016-10-25 Thread Damiano Porta
on the custom element in the > descriptor. > > This is optional if you don't have any parameters, you don't need to pass > anything at all. > > Jörn > > > On Tue, Oct 25, 2016 at 2:00 PM, Damiano Porta > wrote: > > > Oh thanks! i try. > > If I ex

Why can i not serialize a Dictionary ?

2016-10-25 Thread Damiano Porta
Hello, i am getting a strange error during the compiling of a NER model. Basically, the end of the build output is: 98: ... loglikelihood=-13340.018762351776 0.999005934601099 99: ... loglikelihood=-13258.358751926637 0.9990120681028991 100: ... loglikelihood=-13178.039964721707 0.9990177634

Re: Why can i not serialize a Dictionary ?

2016-10-27 Thread Damiano Porta
do not have other info. Do i have to create a custom Serializer too? 2016-10-27 22:04 GMT+02:00 Joern Kottmann : > On Thu, 2016-10-27 at 21:18 +0200, Joern Kottmann wrote: > > On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote: > > > > > > i am getting a strange er

Custom resources on NER model

2016-10-28 Thread Damiano Porta
Hello, could someone explain how to add a dictionary resource during the train of a NER model? At the moment i add a map of resources doing: try (InputStream modelIn = new FileInputStream("/home/damiano/fake.xml")) { Dictionary dictionary = new Dictionary(modelIn); map.put("damiano", dictionary)

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:153) ... 4 more 2016-10-28 12:55 GMT+02:00 Joern Kottmann : > Try to rename the dictionary key to xyz.dictionary then the serializer will > be mapped correctly. > > Jörn > > On Thu, Oct 27, 2016 at 11:14 PM, Damiano Porta > wrote: > >

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
Pardon, my wrong, i forgot to change into in my train.xml now it is working well! and the .bin has my dictionary too 2016-10-28 13:51 GMT+02:00 Damiano Porta : > Jorn > i change the code as you told me, this exactly: https://gist.github.com/ > anonymous/8877b09d441d2e64c181fa9

Re: Why can i not serialize a Dictionary ?

2016-10-28 Thread Damiano Porta
is the my xml descriptor: Damiano 2016-10-28 14:00 GMT+02:00 Damiano Porta : > Pardon, my wrong, i forgot to change dict="damiano"/> into dict="damiano.dictionary"/>in my train.xml > >

Re: Why can i not serialize a Dictionary ?

2016-10-29 Thread Damiano Porta
ok! thank you Jorn! 2016-10-29 13:54 GMT+02:00 Joern Kottmann : > The class has to be on your classpath otherwise it can't be loaded. > > Jörn > > On Fri, 2016-10-28 at 22:59 +0200, Damiano Porta wrote: > > Jorn, > > as I wrote i have created the ner mod

Re: Why can i not serialize a Dictionary ?

2016-10-30 Thread Damiano Porta
Jorn what suffix should i use if i need a postagger model in a FeatureGenerator? For dictionary i use mydictionary.dictionary as you told me. What about postagger .bin? Thanks Damiano Il 29/Ott/2016 14:27, "Damiano Porta" ha scritto: > ok! thank you Jorn! > > 2016-10-29 13

Re: Why can i not serialize a Dictionary ?

2016-10-31 Thread Damiano Porta
ill need a hack. > > 2016-10-30 12:59 GMT-02:00 Damiano Porta : > > > Jorn > > what suffix should i use if i need a postagger model in a > FeatureGenerator? > > > > For dictionary i use mydictionary.dictionary as you told me. What about > > postagger .bin? &g

Re: Why can i not serialize a Dictionary ?

2016-10-31 Thread Damiano Porta
ger for each call. > Maybe you can store the output to a cache. > > 2016-10-31 10:49 GMT-02:00 Damiano Porta : > > > hmm ok i load the postagger in the constructor of my custom > > FeatureGenerator, this is an example: > > > > https://gist.github.com/ano

Coreference resolution

2016-10-31 Thread Damiano Porta
Hello, can we implement coreference resolution someway? Thanks Damiano

Face recognition

2016-11-24 Thread Damiano Porta
Hello, is there something i can use for image (face in this case) recognition? Thanks Damiano

EntityLinker example

2016-11-26 Thread Damiano Porta
Hello, do you have an example or a test to see how the EntityLinker works? Thanks Damiano

Re: EntityLinker example

2016-11-26 Thread Damiano Porta
oking at > the code will probably make sense. The GeoEntityLinker is in the ADDONS > repo. > > On Sat, Nov 26, 2016 at 5:51 PM, Damiano Porta > wrote: > > > Hello, > > do you have an example or a test to see how the EntityLinker works? > > Thanks > > > > Damiano > > >

LemmatizerME via Maven

2016-12-04 Thread Damiano Porta
Hello, there is not LemmatizerME class in OpenNLP 1.6.0 ( https://github.com/apache/opennlp/blob/trunk/opennlp-tools/src/main/java/opennlp/tools/lemmatizer/LemmatizerME.java ) I have this dependency: org.apache.opennlp opennlp-tools 1.6.0 How

Re: LemmatizerME via Maven

2016-12-04 Thread Damiano Porta
never mind... https://opennlp.apache.org/maven-dependency.html#opennlp-tools-snapshot-dependency 2016-12-04 12:14 GMT+01:00 Damiano Porta : > Hello, > there is not LemmatizerME class in OpenNLP 1.6.0 > (https://github.com/apache/opennlp/blob/trunk/opennlp- > tools/src/main/java/o

Lemmatizer BUG

2016-12-05 Thread Damiano Porta
Hello, I am doing some tests with the lemmatizerME. It is returning a wrong word, a word that never occurs in the training data. Basically it is NOT an italian word :) The output is: [O, O, O, O, *R1trR0ae*] The code: try (InputStream in = new FileInputStream("/home/damiano/lemmas.bin")

Re: Lemmatizer BUG

2016-12-05 Thread Damiano Porta
d predicted lemma classes, to perform the decoding (apply the > permutations) and output the actual lemma (iniziare in your example). > > Cheers, > > Rodrigo > > On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta > wrote: > > > Hello, > > I am doing some tests wit

Re: Lemmatizer BUG

2016-12-05 Thread Damiano Porta
required to go from the word form to > the lemma. This is because it is much easier to generalize (e.g., many > word-lemma pairs are captured by the same permutation class) to learn over > those permutation classes than on the lemmas themselves. > > HTH, > > Rodrigo > > &g

Re: EntityLinker example

2016-12-06 Thread Damiano Porta
Hello again Mark, pardon for the late reply. Basically i would like to link entities (different types) inside sentences. I do not need to link them to an external resource, i am referring to co-reference. Is entitylinker good for that? Thanks 2016-11-27 0:18 GMT+01:00 Damiano Porta : > Tha

Re: EntityLinker example

2016-12-06 Thread Damiano Porta
Hmm, ok, so i must use an external tool for coreference. 2016-12-06 12:24 GMT+01:00 Mark G : > No , sorry, I don't think it would be any help for that. > > Sent from my iPhone > > > On Dec 6, 2016, at 5:57 AM, Damiano Porta > wrote: > > > > Hello aga

Re: EntityLinker example

2016-12-06 Thread Damiano Porta
g. > > On Tue, Dec 6, 2016 at 7:07 AM, Damiano Porta > wrote: > > > Hmm, ok, so i must use an external tool for coreference. > > > > 2016-12-06 12:24 GMT+01:00 Mark G : > > > > > No , sorry, I don't think it would be any help for that. > > >

Get Original text

2016-12-16 Thread Damiano Porta
Hello, is it possible to get/pass the original text inside a Custom NER Feature Generator somehow? Thanks Damiano

Re: Get Original text

2016-12-19 Thread Damiano Porta
original text too and analyze it inside the createFeatures() callback. Is it possible somehow? 2016-12-19 13:54 GMT+01:00 Joern Kottmann : > Hello, > > the question is not clear to me. The feature generator sees the current > sentence only. > > Jörn > > On Sat, Dec 17,

Sentence's outcomes

2017-01-14 Thread Damiano Porta
Hello, using the find() of NameFinderME i get a Span[], is it possible to get the list of outcomes inside a String[] with BIO codec? Thanks Damiano

Re: Suggestion/Query - Adding weights to words in Document Classifier

2017-01-18 Thread Damiano Porta
Manoj, you can add custom feature using a generator that implements this: https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java take a look at https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doc

Adding entries to resources without re-training the TokenNameFinderModel

2017-01-25 Thread Damiano Porta
Hello everybody, I am using the NameFinder tool with a custom TokenNameFinderModel model. I built this model using many DictionaryFeatureGenerators that call dictionaries i have loaded during the training. TokenNameFinderFactory factory = new TokenNameFinderFactory( IOUtils.toByteArray(in),

word2vec

2017-01-25 Thread Damiano Porta
Hello, how can we use word2vec as featuregenerator for NER? I just have found on documentation, can anyone give me more details? Thanks

Re: word2vec

2017-01-25 Thread Damiano Porta
as explained > in the manual: > > http://opennlp.apache.org/documentation/1.7.1/manual/ > opennlp.html#tools.namefind.training > > HTH, > > R > > On Wed, Jan 25, 2017 at 8:09 PM, Damiano Porta > wrote: > > > Hello, how can we use word2vec as featuregener

Re: word2vec

2017-01-25 Thread Damiano Porta
Aha yeah it helped me to understand the input and output formats. ok i will try to create clusters using the official tool. Thanks! Damiano Il 25/Gen/2017 21:54, "Rodrigo Agerri" ha scritto: It might, I forgot that :) R On Wed, Jan 25, 2017 at 9:43 PM, Damiano Porta wrote:

Re: Name Finder trainer default settings

2017-02-07 Thread Damiano Porta
I have good results with perceptron, but +1 for CRF 2017-02-07 15:42 GMT+01:00 Russ, Daniel (NIH/CIT) [E] : > Hi Jörn, > > > >I think the best entity recognition systems use CRF’s. At some point > we might want to consider adding them. As you know, ME classifiers suffer > from label bias pr

Tokenizer for NER training

2017-03-02 Thread Damiano Porta
Hello everybody, i have created a custom tokenizer that does not split specific "patterns" like, emails, telephones, dates etc. I convert them into ONE single token. The other parts of text are tokenized with the SimpleTokenizer. The problem is when i need to train a NER model. For example if my

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
intable (though possible not an alphanumeric character like an > underscore)? > Daniel > > On 3/2/17, 11:46 AM, "Damiano Porta" wrote: > > Hello everybody, > > i have created a custom tokenizer that does not split specific > "patterns" > li

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
at 3011234567.” even though > your regex wont match (if you look at the previous 4 words to catch “call > me”). > > > Daniel > > On 3/2/17, 12:24 PM, "Damiano Porta" wrote: > > Hello Daniel, yes exactly, i do that. I am using regexes to f

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
tokenizer will do it and how it is > annotated in the training data. In any case, the most important thing is > for the tokenization to be consistent for training and testing. > > HTH, > > Rodrigo > > ... > > On Thu, Mar 2, 2017 at 5:46 PM, Damiano Porta > wrote

Re: Tokenizer for NER training

2017-03-02 Thread Damiano Porta
ou will be fine. > As long as in testing time you use the same tokenization. > > Cheers, > > R > > On Thu, Mar 2, 2017 at 11:24 PM, Damiano Porta > wrote: > > > Hi Rodrigo, thanks for your message. > > My problem is that dates does not follow a correct format

BUG in NameSample

2017-03-03 Thread Damiano Porta
Hello everybody, I think i found a bug in NameSample. This is the use case: String[] tokens = new String[] { "0", "1", "2", "3", "4", ",", "6", "7", "8" }; Span[] spans = new Span[] { new Span(7,8, "zipcode"), new Span(1,7, "address"), }; NameSample n = new NameSample(tokens, spans, true); the

CUDA

2017-03-04 Thread Damiano Porta
Hello everybody, does OpenNLP support CUDA parallel computing? Damiano

Training perceptron model

2017-03-05 Thread Damiano Porta
Hello, I am training a NER model with perceptron classifier (using OpenNLP 1.7.0) the output of the training is: Indexing events using cutoff of 0 Computing event counts... done. 11861603 events Indexing... done. Collecting events... Done indexing. Incorporating indexed data for training... d

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
it is doing. Damiano 2017-03-06 10:19 GMT+01:00 Joern Kottmann : > Hello, > > this looks like output from the cross validator. > > Jörn > > On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta > wrote: > > > Hello, > > > > I am training a NER model with

Re: CUDA

2017-03-06 Thread Damiano Porta
e probably add support for one of > the deep learning packages and those usually use CUDA. > > Jörn > > On Sat, Mar 4, 2017 at 5:17 PM, Damiano Porta > wrote: > > > Hello everybody, > > > > does OpenNLP support CUDA parallel computing? > > > > Damiano > > >

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
gt; > Jörn > > On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta > wrote: > > > Hello Jorn, > > I tried with 300 iterations and it takes forever, reducing that number to > > 100 i can finally get the model in half an hour. > > > > The problem with 300

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
f you use 300 instead of 100 it should take three times as > long. > > Jörn > > On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta > wrote: > > > Jorn, > > I am training and testing the model via api. If it is not a training > > problem. How is that possible tha

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
this is repeated n times, so that each > partition was once used for testing. > > It really should be three times as long in your case, maybe there is > something else wrong?' > > Jörn > > On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta > wrote: > > > Unfort

Re: Training perceptron model

2017-03-06 Thread Damiano Porta
issue? 2017-03-06 13:43 GMT+01:00 Damiano Porta : > Oh I see. Thanks! > > Basically i have 30k sentences i apply the labels with a script and then i > pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate > it. > > I am trying to build the model with

CoReference

2017-05-18 Thread Damiano Porta
Hello everybody, i need a coreference solution to link my entities (DATE, PERSON, ORG). Can someone show me the way to start working on that? Thank you so much. Damiano

Re: CoReference

2017-05-18 Thread Damiano Porta
was any other work not mentioned in that issue. > > Hope that helps > Bruno > > From: Damiano Porta > To: dev@opennlp.apache.org > Sent: Thursday, 18 May 2017 10:54 PM > Subject: CoReference > > > > Hello everybody, > > i

Re: CoReference

2017-05-18 Thread Damiano Porta
Oh my wrong. Pardon. Do we have accuracy statistics? Il 18 mag 2017 14:59, "Joern Kottmann" ha scritto: > This is for linking entities in one document, e.g. first name mention to a > full name mention, or to he, she, it. > > Jörn > > On Thu, May 18, 2017 at 1:2

Re: CoReference

2017-05-18 Thread Damiano Porta
Do you also have an example? :) Il 18 mag 2017 16:35, "Damiano Porta" ha scritto: > Oh my wrong. Pardon. > Do we have accuracy statistics? > > Il 18 mag 2017 14:59, "Joern Kottmann" ha scritto: > >> This is for linking entities in one document, e.g. fir

Re: CoReference

2017-05-19 Thread Damiano Porta
are more than welcome to get this back into opennlp-tools. > > Jörn > > On Thu, May 18, 2017 at 4:37 PM, Damiano Porta > wrote: > > > Do you also have an example? :) > > > > Il 18 mag 2017 16:35, "Damiano Porta" ha > scritto: > > > > > Oh

Stemmer Feature Generator

2017-05-25 Thread Damiano Porta
Hello, do you think a StemmerFeatureGenerator can be useful for NER models? I can create a PR for it. Damiano

Re: Stemmer Feature Generator

2017-05-26 Thread Damiano Porta
Jorn, what is the current performace with CONLL 2003? 2017-05-26 17:43 GMT+02:00 Joern Kottmann : > Hello, > > can you post performance numbers? Only if it helps with some data set it > would make sense to add it. > > Jörn > > On Thu, May 25, 2017 at 3:10 PM, Damiano Por

Re: Stemmer Feature Generator

2017-05-27 Thread Damiano Porta
n > > On Fri, May 26, 2017 at 6:16 PM, Damiano Porta > wrote: > > > Jorn, what is the current performace with CONLL 2003? > > > > 2017-05-26 17:43 GMT+02:00 Joern Kottmann : > > > > > Hello, > > > > > > can you post performance

AdditionalContextFeatureGenerator

2017-05-31 Thread Damiano Porta
Hello, can we not use the generator AdditionalContextFeatureGenerator for training? I do not see the *ne=* feature during the training... only the generators inside my xml are able to add features. How can i see if this custom context is begin used? I pass the context in the NameSample:

Missing serializer for postagger.bin

2017-06-05 Thread Damiano Porta
Hello, i am using the POSTaggerFeatureGenerator via generators.xml during the training i add this model in the resources doing: HashMap map = new HashMap<>(); map.put("postagger.bin", myPostaggerModel); factory = new TokenNameFinderFactory( IOUtils.toBy

CrossValidator with folds=1 gives me F-Measure: 0.116

2017-06-06 Thread Damiano Porta
Hello, I am getting very strange results with *TokenNameFinderCrossValidator* API. My generators.xml is: CODE: *try (O

  1   2   >