Hi,
Yes! It works! I guess I am asking how did you know to use
wordlists.words('IM50re.txt')? Is this a specific command, as I believe it was
not in the book?
Thanks.
________________________________
From: Kent Johnson <[email protected]>
To: Ishan Puri <[email protected]>
Cc: *tutor python <[email protected]>
Sent: Saturday, August 29, 2009 3:34:09 AM
Subject: Re: [Tutor] NLTK
On Fri, Aug 28, 2009 at 10:16 PM, Ishan Puri<[email protected]> wrote:
>>>> emma = nltk.corpus.gutenberg.words('austen-emma.txt')
>>>> len(emma)
> 192427
>
> So this is the number of words in a particular 'austen-emma.txt'. How would
> I do this
> with my IM50re.txt? It
> seems the code "nltk.corpus.gutenberg.words" is specific to some Gutenberg
> corpus installed with NLTK.
> Like this many examples are given for different analyses that can be done
> with NLTK. However they all seem to be specific
> to one of the texts above or another one already installed with NLTK. I am
> not sure how to apply these examples to my own corpus.
This is pretty much the next line in the "Loading your own Corpus"
example. After
>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root='C:\Users\Ishan\Documents'
>>> wordlists = PlaintextCorpusReader(corpus_root, 'IM50re.txt')
>>> wordlists.fileids()
['IM50re.txt']
you should be able to do
my_words = wordlists.words('IM50re.txt')
len(my_words)
Kent
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor