from:"mausg"

NLTK

2018-08-06 Thread mausg

I like to analyse text. my method consisted of something like words=text.split(), which would split the text into space-seperated units. then I tried to use the Python NLTK library, which had alot of features I wanted, but using `word-tokenize' gives a different answer.- What gives?. -- m..

Re: NLTK

2018-08-08 Thread mausg

On 2018-08-07, Stefan Ram wrote: > Steven D'Aprano writes: >>In natural language, words are more complicated than just space-separated >>units. Some languages don't use spaces as a word delimiter. > > Even above, the word »units« is neither directly preceded > nor directly followed by a spac