Re: [Tutor] NLTK

Ishan Puri Sat, 29 Aug 2009 11:11:42 -0700

Hi,
    Yes! It works! I guess I am asking how did you know to use 
wordlists.words('IM50re.txt')? Is this a specific command, as I believe it was 
not in the book?
        Thanks.





________________________________
From: Kent Johnson <[email protected]>
To: Ishan Puri <[email protected]>
Cc: *tutor python <[email protected]>
Sent: Saturday, August 29, 2009 3:34:09 AM
Subject: Re: [Tutor] NLTK

On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri<[email protected]> wrote:

>>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt')
>>>> len(emma)
> 192427
>
> So this is the number of words in a particular 'austen-emma.txt'. How would
> I do this
> with my IM50re.txt? It
>  seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg
> corpus installed with NLTK.
> Like this many examples are given for different analyses that can be done
> with NLTK. However they all seem to be specific
> to one of the texts above or another one already installed with NLTK. I am
> not sure how to apply these examples to my own corpus.

This is pretty much the next line in the "Loading your own Corpus"
example. After
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root='C:\Users\Ishan\Documents'
>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>> wordlists.fileids()
['IM50re.txt']

you should be able to do
my_words = wordlists.words('IM50re.txt')
len(my_words)

Kent

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] NLTK

Reply via email to