I wonder if there is any interest in adding stop words to CLDR? Stop
words are ignored by natural language processing algorithms, with use
cases like search engines, word clouds and text classification.

There are already existing collections with stop words like [1] or [2]
which could be used, but I believe that Unicode CLDR would be the best
place for such lists.

Regards,

Marius Spix

[1] https://pypi.org/project/stop-words/
[2]
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip

Reply via email to