On 2018-05-25 23:24, Cameron Simpson wrote:
[snip]
You can reduce that list by generating the "wordlist" form from something
smaller:
base_phrases = ["Kilauea volcano", "government of Mexico", "Hawaii"]
wordlist = [
(base_phrase, " ".join([word + "/TAG" for word in base_phrase.split()]))
for base_phrase in base_phrases
]
You could even autosplit the longer phrases so that your base_phrases
_automatically_ becomes:
base_phrases = ["Kilauea volcano", "Kilauea", "volcano", "government of
Mexico", "government", "Mexico", "Hawaii"]
That list should also include "of".
As the OP doesn't want all instances of "of" to be tagged, there could
be a separate exceptions list that contains those sub-phrases that
should not be tagged; they would be dropped from the base_phrases list
that was created.
[snip]
--
https://mail.python.org/mailman/listinfo/python-list