Re: Custom features for sentence detector

2018-02-14 Thread Aliaksandr Autayeu
Yes, you can. See SentenceDetectorFactory.getSDContextGenerator() method. And respectively SDContextGenerator interface and the default implementation in DefaultSDContextGenerator. On 7 February 2018 at 12:17, Damiano Porta wrote: > Hello, > can we add custom features on

Re: Default POS Tagger Dataset

2015-09-17 Thread Aliaksandr Autayeu
Penn Treebank: https://www.cis.upenn.edu/~treebank/ On 16 September 2015 at 21:26, Nishant Kelkar wrote: > Hi all, > > Just wanted to know: what is the data set used to train the default POS > tagger en-pos-maxent.bin, and where can I download it? > > Thanks! > > Best

Re: JWNL bug???

2015-06-10 Thread Aliaksandr Autayeu
Most likely not. It looks like the first option refers to PennTreeBank tags (nouns - N-N, N-NS, etc, verbs - V-B, V-BD, etc, adjectives - J-J, J-JR, J-JS, adverbs - R-B, etc) and the second option refers to WordNet nvar tags - n-oun, v-erb, a-djective, adve-r-b. It's a bit strange to see two type

Re: English lemmatizer using wordnet

2013-04-12 Thread Aliaksandr Autayeu
Spanish pos + lemmatizer using this approach. +1, it would be nice to have control over the dictionary, maybe we can come up with a format to store it in. That will allow us to easily include it in our models as a resource for feature generation and eliminates the dependency on external

Re: Tagsets OpenNLP

2013-01-25 Thread Aliaksandr Autayeu
If I'm not mistaken and understood you correctly, it's a PennTreeBank tagset: http://www.cis.upenn.edu/~treebank/ cheers, Aliaksandr 2013/1/25 Javier SANCHEZ MONZON javier.sanchez-mon...@unister.de Hi there i would like to know if is there a tagset list for the postaging task in OpenNLP?

Re: NER using perceptron instead of MaXent?

2012-10-05 Thread Aliaksandr Autayeu
Jim, you might use command line tools source code as a hint as well ;) Aliaksandr On Fri, Oct 5, 2012 at 5:25 PM, Jim foo.bar jimpil1...@gmail.com wrote: Hi William, First of all thanks for the prompt reply, however I am using the API not the cmd tool... where do I pass that properties

Re: Anyone see issues with jwnl library hangs?

2012-08-07 Thread Aliaksandr Autayeu
I had similar issues with JWNL, but long time ago, I don't remember details now. A small piece of code to reproduce the issue would help a lot looking into it ;) Aliaksandr On Mon, Aug 6, 2012 at 10:39 PM, Jörn Kottmann kottm...@gmail.com wrote: Hello, never experienced that issue. Its

Re: Training a POS tagger model

2012-07-27 Thread Aliaksandr Autayeu
Hi Alessandra, I would like to provide (train) a POS tagger model for italian language. I have some questions: - may I use a token_tag pair list in place of sentence list? Something like: casa_NOUN e_CON (conjuction) This way you loose context. There is a window (few tokens around the