On 2018-05-25 23:24, Cameron Simpson wrote:
[snip]
You can reduce that list by generating the "wordlist" form from something
smaller:

   base_phrases = ["Kilauea volcano", "government of Mexico", "Hawaii"]
   wordlist = [
       (base_phrase, " ".join([word + "/TAG" for word in base_phrase.split()]))
       for base_phrase in base_phrases
   ]

You could even autosplit the longer phrases so that your base_phrases
_automatically_ becomes:

   base_phrases = ["Kilauea volcano", "Kilauea", "volcano", "government of
   Mexico", "government", "Mexico", "Hawaii"]

That list should also include "of".

As the OP doesn't want all instances of "of" to be tagged, there could be a separate exceptions list that contains those sub-phrases that should not be tagged; they would be dropped from the base_phrases list that was created.

[snip]
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to