Yes, you can. See SentenceDetectorFactory.getSDContextGenerator() method.
And respectively SDContextGenerator interface and the default
implementation in DefaultSDContextGenerator.
On 7 February 2018 at 12:17, Damiano Porta wrote:
> Hello,
> can we add custom features on
Penn Treebank: https://www.cis.upenn.edu/~treebank/
On 16 September 2015 at 21:26, Nishant Kelkar wrote:
> Hi all,
>
> Just wanted to know: what is the data set used to train the default POS
> tagger en-pos-maxent.bin, and where can I download it?
>
> Thanks!
>
> Best
Most likely not. It looks like the first option refers to PennTreeBank tags
(nouns - N-N, N-NS, etc, verbs - V-B, V-BD, etc, adjectives - J-J, J-JR,
J-JS, adverbs - R-B, etc) and the second option refers to WordNet nvar
tags - n-oun, v-erb, a-djective, adve-r-b. It's a bit strange to see two
type
Spanish pos + lemmatizer using this approach.
+1, it would be nice to have control over the dictionary, maybe we can
come up with
a format to store it in. That will allow us to easily include it in our
models
as a resource for feature generation and eliminates the dependency on
external
If I'm not mistaken and understood you correctly, it's a PennTreeBank
tagset: http://www.cis.upenn.edu/~treebank/
cheers,
Aliaksandr
2013/1/25 Javier SANCHEZ MONZON javier.sanchez-mon...@unister.de
Hi there
i would like to know if is there a tagset list for the postaging task in
OpenNLP?
Jim, you might use command line tools source code as a hint as well ;)
Aliaksandr
On Fri, Oct 5, 2012 at 5:25 PM, Jim foo.bar jimpil1...@gmail.com wrote:
Hi William,
First of all thanks for the prompt reply, however I am using the API not
the cmd tool...
where do I pass that properties
I had similar issues with JWNL, but long time ago, I don't remember details
now. A small piece of code to reproduce the issue would help a lot looking
into it ;)
Aliaksandr
On Mon, Aug 6, 2012 at 10:39 PM, Jörn Kottmann kottm...@gmail.com wrote:
Hello,
never experienced that issue.
Its
Hi Alessandra,
I would like to provide (train) a POS tagger model for italian language.
I have some questions:
- may I use a token_tag pair list in place of sentence list? Something
like:
casa_NOUN
e_CON (conjuction)
This way you loose context. There is a window (few tokens around the