Hello!
I would like to understand the best approach to the following problem.
I have documents really similar to resume/cv and i have to extract entities
( Name, Surname, Birthday, Cities, zipcode etc).
To extract those entities I am combining different finders:
Birthday and zipcodes = RegexName
Hello,
I have created a very big dictionary of companies, it is around 3M.
At the moment i am using DictionaryNameFinder class, but I need to
implement something to find typos like Gogle/Gooogle Inc etc.
I read something about leveinstain distance, is this implementend in
OpenNLP?
It seems good bu
Hi Catalin,
Can i use it with DictionaryNameFinder?
Thanks
Damiano
Il giorno Dom 13 Set 2015 21:08 Catalin Mititelu
ha scritto:
> Hi Damiano,
>
> You may try Lucene fuzzy query which is based on Levenstein distance.
>
> BR,
> Catalin
>
> On 09/13/2015 09:59 PM, Damian
ect orm
> in your DictionaryNameFinder.
>
> Please let me know if it seems feasible.
>
> BR,
> Catalin
>
>
>
> On 09/13/2015 10:35 PM, Damiano Porta wrote:
>
>> Hi Catalin,
>> Can i use it with DictionaryNameFinder?
>> Thanks
>> Damiano
h a Lucene
> index. When you mentioned DictionaryNameFinder I was thinking at Name
> entity recognition module (tagging being done using a NER model).
>
> Sorry for this misunderstanding.
>
> BR,
> Catalin
>
>
> On 09/14/2015 03:31 PM, Damiano Porta wrote:
>
>&g
Hello!
Is there a grammar(pattern engine) like
https://gate.ac.uk/sale/tao/splitch8.html#chap:jape for OPENNLP ?
Thank you!
Hello,
is there a tool like http://nlp.stanford.edu/software/tokensregex.shtml in
OpenNLP?
Thanks
Damiano
Thanks Michael!
2016-01-04 17:39 GMT+01:00 Michael Schmitz :
> You could use https://github.com/knowitall/openregex or
> https://github.com/knowitall/openregex-scala. They are toolkit-neutral.
>
> Peace. Michael
>
> On Mon, Dec 28, 2015 at 3:56 AM, Damiano Porta
> wrote
Hello everybody
How many surrounding tokens are kept into account to find the entity using
a maxent model?
Basically a maxent model should detect an entity looking at the surronding
tokens, right ?
I would like to understand if:
1. can i set the number of tokens on the left side?
2. can i set the
docs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html
>
>
> On Sun, May 1, 2016 at 5:16 AM, Damiano Porta
> wrote:
> >
> > Hello everybody
> > How many surrounding tokens are kept into account to find the entity
> using
> > a maxen
eaturegen/AdaptiveFeatureGenerator.html
> [2]
>
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html
>
>
> On Sun, May 1, 2016 at 12:02 PM, Damiano Porta
> wrote:
>
> Hi Jeff!
> Thank you so
Hello everybody,
we built a NER model to find persons (name) inside our documents.
We are looking for the best approach to understand if the name is
male/female.
Possible solutions:
- Plain dictionary?
- Regex to check the initial and/letters of the name?
- Classifier? (naive bayes? Maxent?)
Tha
2016-06-28 18:56 GMT-03:00 Damiano Porta :
>
> > Hello everybody,
> >
> > we built a NER model to find persons (name) inside our documents.
> > We are looking for the best approach to understand if the name is
> > male/female.
> >
> > Possible solutions:
>
> Do you plan to use the surrounding context? If yes, maybe you could try
> > to
> > > split NER in two categories: PersonM and PersonF. Just an idea, never
> > read
> > > or tried anything like it. You would need a training corpus with these
> > > classes.
n
> change/add a feature, evaluate and take notes. Sometimes a feature that we
> are sure would help can destroy the model effectiveness.
>
> Regards
> William
>
>
> 2016-06-29 7:00 GMT-03:00 Damiano Porta :
>
> > Thank you William! Really appreciated!
> >
>
m a classification task using any machine learning
> algorithm.
>
> Another way would be using the information itself (whether the name fits
> for males, females or both) as a feature when you perform the
> classification.
>
> Best regards,
>
> Mondher
>
> I am n
features
>
> F1 = False
> F2 = True
> F3 = UNCERTAIN
> F4 = 1
> F5 = FEMALE
> F6 = 3
> F7 = FEMALE
> F8 = 4
> F9 = UNCERTAIN
> F10 = 2
> F11 = EMPTY
> F12 = 0
> F13 = EMPTY
> F14 = 0
>
> Of course the choice of features depends on the type of data
un left (in
> > words)Values=NUMERIC
> >
> > In the second example here are the values you have for your features
> >
> > F1 = False
> > F2 = True
> > F3 = UNCERTAIN
> > F4 = 1
> > F5 = FEMALE
> > F6 = 3
> > F7 = FEMALE
n Mon, Jul 4, 2016 at 2:41 PM, Joern Kottmann wrote:
>
> > I was speaking about the second case. We could build a dedicated
> component
> > specialized in extracting properties about already detected entities.
> >
> > Jörn
> >
> > On Mon, Jul 4, 2016 at
Hi William,
we need to update the link, it is pointing to a wrong page. It returns Not
Found.
2016-07-05 13:19 GMT+02:00 William Colen :
> It is not that easy. You could start from "Papers implemented by OpenNLP":
>
> https://cwiki.apache.org/confluence/display/OPENNLP/NLP+Papers
>
> I believe th
Hello everyone,
pardon for the stupid question but i really do not get the point about
training a maxent model with complete sentences.
For example:
Pierre Vinken , 61 years old , will join the board as
a nonexecutive director Nov. 29 .
it has ~20 tokens.
As described here:
https://opennlp.apa
be ignored.
2016-08-12 16:26 GMT+02:00 William Colen :
> You also need examples of what is not entities.
>
>
> 2016-08-12 11:21 GMT-03:00 Damiano Porta :
>
> > Hello everyone,
> > pardon for the stupid question but i really do not get the point about
> > trai
e is
> an entity in the middle of every window.
>
>
> 2016-08-12 11:35 GMT-03:00 Damiano Porta :
>
> > Ok, but why not just ignore all the others tokens? i mean... when i
> write 2
> > TOKENS + ENTITY + 2 TOKENS i am interested on finding the entity with
> this
>
ot;
version of it)
2016-08-12 16:51 GMT+02:00 Damiano Porta :
> Ok thank you so much guys!
>
> 2016-08-12 16:43 GMT+02:00 William Colen :
>
>> You need to train with a corpus that is as close as possible as your
>> runtime corpus. If your runtime corpus is like that I thi
Hello,
After person, addresses etc I also need to extract email/telephone from my
documents, i just found
https://github.com/apache/opennlp/blob/cac4db6d3cb74ae3414fc8c438eec770af783538/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinderFactory.java
Reading the code it seems to be
Hello,
pardon guys for all these questions but i am trying to study OpenNLP deeply.
I write a simple code, you can see it here:
https://issues.apache.org/jira/browse/OPENNLP-859?jql=project%20%3D%20OPENNLP
I am trying to understand what the generators are and what is their job.
I know they add fea
p://opennlp.apache.org/documentation/1.6.0/apidocs/
> opennlp-tools/opennlp/tools/namefind/DictionaryNameFinder.html
>
> Regards
> William
>
> 2016-08-16 15:50 GMT-03:00 Damiano Porta :
>
> > Hello,
> >
> > pardon guys for all these questions bu
Hello everybody!
Could someone explain why should I separate each sentence of my documents
to train my models?
My documents are like resume/cv and the sentences can be very different.
For example a sentence could also be :
1. Name: John
2. Surname: travolta
Etc etc
So my question is. What is the
;
>
> Daniel Russ, Ph.D.
> Staff Scientist, Office of Intramural Research
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda, MD 20892-5624
>
> On Aug 25, 2016, at 9:55 AM, Damian
Drive
Bethesda, MD 20892-5624
On Aug 26, 2016, at 5:57 AM, Damiano Porta > wrote:
Hi Daniel!
Thank you so much for your opinion.
It makes perfectly sense. But i am still a bit confused about the length of
the sentences.
In a resume there are many names, dates etc etc. So my doubt is regardi
Jörn
>
> On Fri, Aug 26, 2016 at 3:25 PM, Damiano Porta
> wrote:
>
> > Hi!
> > Yes I can train a good model (sure It will takes a lot of time), i have
> 30k
> > resumes. So the "data" isnt a problem.
> > I thought about many things, i am also creati
t pass in the entire document.
Jörn
On Fri, Aug 26, 2016 at 3:25 PM, Damiano Porta mailto:damianopo...@gmail.com>>
wrote:
Hi!
Yes I can train a good model (sure It will takes a lot of time), i have
30k
resumes. So the "data" isnt a problem.
I thought about many things, i am al
Staff Scientist, Office of Intramural Research
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda, MD 20892-5624
>
> On Aug 26, 2016, at 1:46 PM, Damiano Porta damianopo...@gmail.com>>
Pardon i meant the "my" word ...
Il 26/Ago/2016 20:49, "Damiano Porta" ha scritto:
> But i think It is the same no? I Mean. ..I will pass all the content as
> one sentence. So in this case the "the" word will be tagged the same.
>
> The problem in t
Hello,
I am creating a custom tokenizer. It works pretty well but i have problems
with emails.
The emails can have _ - . that are tokenized in normal text, so the
question is, how can i train it better?
After the tokenization I need to apply different regexes to extract
email/dates/telephones so i
n email or telephone.
>
>
> Regards
> William
>
>
> Em segunda-feira, 29 de agosto de 2016, Damiano Porta <
> damianopo...@gmail.com> escreveu:
>
> > Hello,
> > I am creating a custom tokenizer. It works pretty well but i have
> problems
> > with e
ok, thanks!
2016-09-10 23:46 GMT+02:00 William Colen :
> When I need I debug the code. I don't know if there is a better way.
>
>
> 2016-09-10 18:24 GMT-03:00 Damiano Porta :
>
> > Hi WIlliam!
> > Yeah i will go with custom generator that add specific feat
Hello,
we need to categorize our documents in 80 sectors. These documents are
resumes/cv.
We have many documents (more than 30k) but there is a problem.
Should we try to extract the job positions inside each resume and
categorize them or can we just add the entire document and categorize it in
one
unity of practitioners (the first mailing list in
> https://opennlp.apache.org/mail-lists.html).
>
> Cohan
>
>
> On Sat, Sep 24, 2016 at 7:12 PM, Damiano Porta
> wrote:
>
> > Hello,
> > we need to categorize our documents in 80 sectors. These documents are
> > resumes/cv.
Hello,
I have created a custom generator implementing the AdaptiveFeatureGenerator
interface.
I am getting this error:
Exception in thread "main"
opennlp.tools.util.ext.ExtensionNotLoadedException:
java.lang.InstantiationException:
com.damiano.parser.generator.SpanFeatureGenerator
at
opennlp.tool
);
System.exit(1);
}
It is obviously a test to understand if my generator is called.
2016-10-25 12:23 GMT+02:00 Joern Kottmann :
> What is the constructor of the
> com.damiano.parser.generator.SpanFeatureGenerator
> class?
>
> Jörn
>
> On Tue, Oct 25, 2016 at 11:51 AM, D
Joern,
However i also tried with:
public SpanFeatureGenerator(Map properties,
FeatureGeneratorResourceProvider resourceProvider) throws
InvalidFormatException {
}
but i get the same exception.
Damiano
2016-10-25 12:30 GMT+02:00 Damiano Porta :
> This at the moment:
>
&g
reGenerator.
> That one has an init method which gives you the attributes defined in the
> xml descriptor.
>
> HTH,
> Jörn
>
> On Tue, Oct 25, 2016 at 12:43 PM, Damiano Porta
> wrote:
>
> > Joern,
> > However i also tried wi
on the custom element in the
> descriptor.
>
> This is optional if you don't have any parameters, you don't need to pass
> anything at all.
>
> Jörn
>
>
> On Tue, Oct 25, 2016 at 2:00 PM, Damiano Porta
> wrote:
>
> > Oh thanks! i try.
> > If I ex
Hello,
i am getting a strange error during the compiling of a NER model.
Basically, the end of the build output is:
98: ... loglikelihood=-13340.018762351776 0.999005934601099
99: ... loglikelihood=-13258.358751926637 0.9990120681028991
100: ... loglikelihood=-13178.039964721707 0.9990177634
do not have other info.
Do i have to create a custom Serializer too?
2016-10-27 22:04 GMT+02:00 Joern Kottmann :
> On Thu, 2016-10-27 at 21:18 +0200, Joern Kottmann wrote:
> > On Tue, 2016-10-25 at 18:49 +0200, Damiano Porta wrote:
> > >
> > > i am getting a strange er
Hello,
could someone explain how to add a dictionary resource during the train of
a NER model?
At the moment i add a map of resources doing:
try (InputStream modelIn = new FileInputStream("/home/damiano/fake.xml")) {
Dictionary dictionary = new Dictionary(modelIn);
map.put("damiano", dictionary)
opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(TokenNameFinderFactory.java:153)
... 4 more
2016-10-28 12:55 GMT+02:00 Joern Kottmann :
> Try to rename the dictionary key to xyz.dictionary then the serializer will
> be mapped correctly.
>
> Jörn
>
> On Thu, Oct 27, 2016 at 11:14 PM, Damiano Porta
> wrote:
>
>
Pardon, my wrong, i forgot to change into in my train.xml
now it is working well! and the .bin has my dictionary too
2016-10-28 13:51 GMT+02:00 Damiano Porta :
> Jorn
> i change the code as you told me, this exactly: https://gist.github.com/
> anonymous/8877b09d441d2e64c181fa9
is the my xml descriptor:
Damiano
2016-10-28 14:00 GMT+02:00 Damiano Porta :
> Pardon, my wrong, i forgot to change dict="damiano"/> into dict="damiano.dictionary"/>in my train.xml
>
>
ok! thank you Jorn!
2016-10-29 13:54 GMT+02:00 Joern Kottmann :
> The class has to be on your classpath otherwise it can't be loaded.
>
> Jörn
>
> On Fri, 2016-10-28 at 22:59 +0200, Damiano Porta wrote:
> > Jorn,
> > as I wrote i have created the ner mod
Jorn
what suffix should i use if i need a postagger model in a FeatureGenerator?
For dictionary i use mydictionary.dictionary as you told me. What about
postagger .bin?
Thanks
Damiano
Il 29/Ott/2016 14:27, "Damiano Porta" ha scritto:
> ok! thank you Jorn!
>
> 2016-10-29 13
ill need a hack.
>
> 2016-10-30 12:59 GMT-02:00 Damiano Porta :
>
> > Jorn
> > what suffix should i use if i need a postagger model in a
> FeatureGenerator?
> >
> > For dictionary i use mydictionary.dictionary as you told me. What about
> > postagger .bin?
&g
ger for each call.
> Maybe you can store the output to a cache.
>
> 2016-10-31 10:49 GMT-02:00 Damiano Porta :
>
> > hmm ok i load the postagger in the constructor of my custom
> > FeatureGenerator, this is an example:
> >
> > https://gist.github.com/ano
Hello,
can we implement coreference resolution someway?
Thanks
Damiano
Hello,
is there something i can use for image (face in this case) recognition?
Thanks
Damiano
Hello,
do you have an example or a test to see how the EntityLinker works?
Thanks
Damiano
oking at
> the code will probably make sense. The GeoEntityLinker is in the ADDONS
> repo.
>
> On Sat, Nov 26, 2016 at 5:51 PM, Damiano Porta
> wrote:
>
> > Hello,
> > do you have an example or a test to see how the EntityLinker works?
> > Thanks
> >
> > Damiano
> >
>
Hello,
there is not LemmatizerME class in OpenNLP 1.6.0
(
https://github.com/apache/opennlp/blob/trunk/opennlp-tools/src/main/java/opennlp/tools/lemmatizer/LemmatizerME.java
)
I have this dependency:
org.apache.opennlp
opennlp-tools
1.6.0
How
never mind...
https://opennlp.apache.org/maven-dependency.html#opennlp-tools-snapshot-dependency
2016-12-04 12:14 GMT+01:00 Damiano Porta :
> Hello,
> there is not LemmatizerME class in OpenNLP 1.6.0
> (https://github.com/apache/opennlp/blob/trunk/opennlp-
> tools/src/main/java/o
Hello,
I am doing some tests with the lemmatizerME.
It is returning a wrong word, a word that never occurs in the training
data. Basically it is NOT an italian word :)
The output is:
[O, O, O, O, *R1trR0ae*]
The code:
try (InputStream in = new
FileInputStream("/home/damiano/lemmas.bin")
d predicted lemma classes, to perform the decoding (apply the
> permutations) and output the actual lemma (iniziare in your example).
>
> Cheers,
>
> Rodrigo
>
> On Mon, Dec 5, 2016 at 11:19 AM, Damiano Porta
> wrote:
>
> > Hello,
> > I am doing some tests wit
required to go from the word form to
> the lemma. This is because it is much easier to generalize (e.g., many
> word-lemma pairs are captured by the same permutation class) to learn over
> those permutation classes than on the lemmas themselves.
>
> HTH,
>
> Rodrigo
>
>
&g
Hello again Mark,
pardon for the late reply.
Basically i would like to link entities (different types) inside sentences.
I do not need to link them to an external resource, i am referring to
co-reference. Is entitylinker good for that?
Thanks
2016-11-27 0:18 GMT+01:00 Damiano Porta :
> Tha
Hmm, ok, so i must use an external tool for coreference.
2016-12-06 12:24 GMT+01:00 Mark G :
> No , sorry, I don't think it would be any help for that.
>
> Sent from my iPhone
>
> > On Dec 6, 2016, at 5:57 AM, Damiano Porta
> wrote:
> >
> > Hello aga
g.
>
> On Tue, Dec 6, 2016 at 7:07 AM, Damiano Porta
> wrote:
>
> > Hmm, ok, so i must use an external tool for coreference.
> >
> > 2016-12-06 12:24 GMT+01:00 Mark G :
> >
> > > No , sorry, I don't think it would be any help for that.
> > >
Hello,
is it possible to get/pass the original text inside a Custom NER Feature
Generator somehow?
Thanks
Damiano
original text too and analyze it inside the
createFeatures() callback.
Is it possible somehow?
2016-12-19 13:54 GMT+01:00 Joern Kottmann :
> Hello,
>
> the question is not clear to me. The feature generator sees the current
> sentence only.
>
> Jörn
>
> On Sat, Dec 17,
Hello,
using the find() of NameFinderME i get a Span[], is it possible to get the
list of outcomes inside a String[] with BIO codec?
Thanks
Damiano
Manoj,
you can add custom feature using a generator that implements this:
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java
take a look at
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doc
Hello everybody,
I am using the NameFinder tool with a custom TokenNameFinderModel model.
I built this model using many DictionaryFeatureGenerators that call
dictionaries i have loaded during the training.
TokenNameFinderFactory factory = new TokenNameFinderFactory(
IOUtils.toByteArray(in),
Hello, how can we use word2vec as featuregenerator for NER?
I just have found on documentation,
can anyone give me more details?
Thanks
as explained
> in the manual:
>
> http://opennlp.apache.org/documentation/1.7.1/manual/
> opennlp.html#tools.namefind.training
>
> HTH,
>
> R
>
> On Wed, Jan 25, 2017 at 8:09 PM, Damiano Porta
> wrote:
>
> > Hello, how can we use word2vec as featuregener
Aha yeah it helped me to understand the input and output formats. ok i will
try to create clusters using the official tool. Thanks!
Damiano
Il 25/Gen/2017 21:54, "Rodrigo Agerri" ha scritto:
It might, I forgot that :)
R
On Wed, Jan 25, 2017 at 9:43 PM, Damiano Porta
wrote:
I have good results with perceptron, but +1 for CRF
2017-02-07 15:42 GMT+01:00 Russ, Daniel (NIH/CIT) [E] :
> Hi Jörn,
>
>
>
>I think the best entity recognition systems use CRF’s. At some point
> we might want to consider adding them. As you know, ME classifiers suffer
> from label bias pr
Hello everybody,
i have created a custom tokenizer that does not split specific "patterns"
like, emails, telephones, dates etc. I convert them into ONE single token.
The other parts of text are tokenized with the
SimpleTokenizer.
The problem is when i need to train a NER model. For example if my
intable (though possible not an alphanumeric character like an
> underscore)?
> Daniel
>
> On 3/2/17, 11:46 AM, "Damiano Porta" wrote:
>
> Hello everybody,
>
> i have created a custom tokenizer that does not split specific
> "patterns"
> li
at 3011234567.” even though
> your regex wont match (if you look at the previous 4 words to catch “call
> me”).
>
>
> Daniel
>
> On 3/2/17, 12:24 PM, "Damiano Porta" wrote:
>
> Hello Daniel, yes exactly, i do that. I am using regexes to f
tokenizer will do it and how it is
> annotated in the training data. In any case, the most important thing is
> for the tokenization to be consistent for training and testing.
>
> HTH,
>
> Rodrigo
>
> ...
>
> On Thu, Mar 2, 2017 at 5:46 PM, Damiano Porta
> wrote
ou will be fine.
> As long as in testing time you use the same tokenization.
>
> Cheers,
>
> R
>
> On Thu, Mar 2, 2017 at 11:24 PM, Damiano Porta
> wrote:
>
> > Hi Rodrigo, thanks for your message.
> > My problem is that dates does not follow a correct format
Hello everybody,
I think i found a bug in NameSample. This is the use case:
String[] tokens = new String[] {
"0",
"1",
"2",
"3",
"4",
",",
"6",
"7",
"8"
};
Span[] spans = new Span[] {
new Span(7,8, "zipcode"),
new Span(1,7, "address"),
};
NameSample n = new NameSample(tokens, spans, true);
the
Hello everybody,
does OpenNLP support CUDA parallel computing?
Damiano
Hello,
I am training a NER model with perceptron classifier (using OpenNLP 1.7.0)
the output of the training is:
Indexing events using cutoff of 0
Computing event counts... done. 11861603 events
Indexing... done.
Collecting events... Done indexing.
Incorporating indexed data for training...
d
it is doing.
Damiano
2017-03-06 10:19 GMT+01:00 Joern Kottmann :
> Hello,
>
> this looks like output from the cross validator.
>
> Jörn
>
> On Sun, Mar 5, 2017 at 11:34 AM, Damiano Porta
> wrote:
>
> > Hello,
> >
> > I am training a NER model with
e probably add support for one of
> the deep learning packages and those usually use CUDA.
>
> Jörn
>
> On Sat, Mar 4, 2017 at 5:17 PM, Damiano Porta
> wrote:
>
> > Hello everybody,
> >
> > does OpenNLP support CUDA parallel computing?
> >
> > Damiano
> >
>
gt;
> Jörn
>
> On Mon, Mar 6, 2017 at 10:29 AM, Damiano Porta
> wrote:
>
> > Hello Jorn,
> > I tried with 300 iterations and it takes forever, reducing that number to
> > 100 i can finally get the model in half an hour.
> >
> > The problem with 300
f you use 300 instead of 100 it should take three times as
> long.
>
> Jörn
>
> On Mon, Mar 6, 2017 at 11:12 AM, Damiano Porta
> wrote:
>
> > Jorn,
> > I am training and testing the model via api. If it is not a training
> > problem. How is that possible tha
this is repeated n times, so that each
> partition was once used for testing.
>
> It really should be three times as long in your case, maybe there is
> something else wrong?'
>
> Jörn
>
> On Mon, Mar 6, 2017 at 12:36 PM, Damiano Porta
> wrote:
>
> > Unfort
issue?
2017-03-06 13:43 GMT+01:00 Damiano Porta :
> Oh I see. Thanks!
>
> Basically i have 30k sentences i apply the labels with a script and then i
> pass 0-15k to train the model (to build the .bin) and 15k-30k to evaluate
> it.
>
> I am trying to build the model with
Hello everybody,
i need a coreference solution to link my entities (DATE, PERSON, ORG). Can
someone show me the way to start working on that?
Thank you so much.
Damiano
was any other work not mentioned in that issue.
>
> Hope that helps
> Bruno
>
> From: Damiano Porta
> To: dev@opennlp.apache.org
> Sent: Thursday, 18 May 2017 10:54 PM
> Subject: CoReference
>
>
>
> Hello everybody,
>
> i
Oh my wrong. Pardon.
Do we have accuracy statistics?
Il 18 mag 2017 14:59, "Joern Kottmann" ha scritto:
> This is for linking entities in one document, e.g. first name mention to a
> full name mention, or to he, she, it.
>
> Jörn
>
> On Thu, May 18, 2017 at 1:2
Do you also have an example? :)
Il 18 mag 2017 16:35, "Damiano Porta" ha scritto:
> Oh my wrong. Pardon.
> Do we have accuracy statistics?
>
> Il 18 mag 2017 14:59, "Joern Kottmann" ha scritto:
>
>> This is for linking entities in one document, e.g. fir
are more than welcome to get this back into opennlp-tools.
>
> Jörn
>
> On Thu, May 18, 2017 at 4:37 PM, Damiano Porta
> wrote:
>
> > Do you also have an example? :)
> >
> > Il 18 mag 2017 16:35, "Damiano Porta" ha
> scritto:
> >
> > > Oh
Hello,
do you think a StemmerFeatureGenerator can be useful for NER models?
I can create a PR for it.
Damiano
Jorn, what is the current performace with CONLL 2003?
2017-05-26 17:43 GMT+02:00 Joern Kottmann :
> Hello,
>
> can you post performance numbers? Only if it helps with some data set it
> would make sense to add it.
>
> Jörn
>
> On Thu, May 25, 2017 at 3:10 PM, Damiano Por
n
>
> On Fri, May 26, 2017 at 6:16 PM, Damiano Porta
> wrote:
>
> > Jorn, what is the current performace with CONLL 2003?
> >
> > 2017-05-26 17:43 GMT+02:00 Joern Kottmann :
> >
> > > Hello,
> > >
> > > can you post performance
Hello,
can we not use the generator AdditionalContextFeatureGenerator for training?
I do not see the *ne=* feature during the training... only the generators
inside my xml are able to add features. How can i see if this custom
context is begin used?
I pass the context in the NameSample:
Hello,
i am using the POSTaggerFeatureGenerator via generators.xml
during the training i add this model in the resources doing:
HashMap map = new HashMap<>();
map.put("postagger.bin", myPostaggerModel);
factory = new TokenNameFinderFactory(
IOUtils.toBy
Hello,
I am getting very strange results with *TokenNameFinderCrossValidator* API.
My generators.xml is:
CODE:
*try (O
1 - 100 of 113 matches
Mail list logo