Handling of Quotes

Ian Jackson Thu, 28 Mar 2013 06:55:24 -0700

I used the prebuilt models for the SetenceModel (en-sent.bin), TokenizerModel 
(en-token.bin), and ParserModel (en-parser-chunker.bin) with the following 
sentence:
   The "quick" brown fox jumps in over the lazy dog.


The result marks the part of speech for the quotes as JJ (for the open) and (NN 
for the close) as follows:
(TOP (NP (NP (DT The) (JJ ") (JJ quick) (NN ") (JJ brown) (NN fox) (NNS jumps)) 
(PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .)))

If I alter the sentence as follows changing double quotes to two single forward 
quotes and backward quotes 
[http://www.cis.upenn.edu/~treebank/tokenization.html]:
   The `` quick '' brown fox jumps over the lazy dog

The results are as follows:
(TOP (NP (NP (DT The) (`` ``) (JJ quick) ('' '') (JJ brown) (NN fox) (NNS 
jumps)) (PP (IN over) (NP (DT the) (JJ lazy) (NN dog))) (. .)))

Does a method exists to configure the tokenizer to handled quotes within a 
sentence?

Handling of Quotes

Reply via email to