> Ok. I see something suspicious here. The for loop:
>
> ######
> for l in xx:
> train_tokens.append(l)
> ######
>
> assumes that we get tokens from the 'xx' token. Is this true? Are you
> sure you don't have to specifically say:
>
> ######
> for l in xx['SUBTOKENS']:
> ...
> ######
Hi Enas,
I'm taking python-list again out of CC: my apologies to the others for not
catching that sooner.
Enas, please do not crosspost to multiple mailing lists. You have been
doing this since at least June:
http://mail.python.org/pipermail/python-list/2005-June/287371.html
http://mail.python.org/pipermail/tutor/2005-June/039351.html
http://mail.python.org/pipermail/tutor/2005-July/039642.html
http://mail.python.org/pipermail/python-list/2005-July/289505.html
http://mail.python.org/pipermail/tutor/2005-October/042155.html
http://mail.python.org/pipermail/python-list/2005-October/303805.html
If you crosspost, we at Tutor won't be able to see responses that go to
python-list, and visa-versa. The end result clutters both lists and isn't
friendly to either community. Please read:
http://www.gweep.ca/~edmonds/usenet/ml-etiquette.html
and try to change your habits in this area.
Anyway, just as a concrete example of this:
######
>>> from nltk.tokenizer import *
>>> text_token = Token(TEXT='hello world this is a test')
>>> text_token.keys()
['TEXT']
>>> WhitespaceTokenizer().tokenize(text_token)
>>> text_token.keys()
['TEXT', 'SUBTOKENS']
>>> text_token['SUBTOKENS']
[<hello>, <world>, <this>, <is>, <a>, <test>]
>>> type(text_token['SUBTOKENS'][0])
<class 'nltk.token.Token'>
######
Do you understand this code, or is there something here that you're not
familar with?
Good luck to you.
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor