For my university end-of-term programming project which I am doing in
Pharo, I created a method that returns a Bag of all words used in the
courses' description repeated the same amount of times that they are in the
text (I am working with Udacity API) so I could use them as keywords and
determine which technologies are mentioned the most in each course.

I am wondering how could I remove 'this' from the list of repeated words,
and how to remove punctuation marks attached to the word? Do I use sentence
segmentation in this NLP library
<https://github.com/mark-watson/nlp_smalltalk> or is there another way to
do that? And I don't really need verbs either so I suppose I should use
Entity Recognition from the NLP Library?

One more thing: as this is my uni project it needs to be sufficiently
documented. Is there a way to generate documentation in Pharo?

Cheers,
Myroslava

Reply via email to