I used the prebuilt models for the SetenceModel (en-sent.bin), TokenizerModel (en-token.bin), and ParserModel (en-parser-chunker.bin) with the following sentence: The "quick" brown fox jumps in over the lazy dog.
The result marks the part of speech for the quotes as JJ (for the open) and (NN for the close) as follows: (TOP (NP (NP (DT The) (JJ ") (JJ quick) (NN ") (JJ brown) (NN fox) (NNS jumps)) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .))) If I alter the sentence as follows changing double quotes to two single forward quotes and backward quotes [http://www.cis.upenn.edu/~treebank/tokenization.html]: The `` quick '' brown fox jumps over the lazy dog The results are as follows: (TOP (NP (NP (DT The) (`` ``) (JJ quick) ('' '') (JJ brown) (NN fox) (NNS jumps)) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .))) Does a method exists to configure the tokenizer to handled quotes within a sentence?
