On Wed, Nov 18, 2015 at 07:22:08PM -0800, John Jason Jordan wrote:
> Thanks to you and the others who responded. 
> 
> Evidently I left out the fact that I found python-ntlk in Synaptic
> package manager and installed it without error. That part is done. My
> problem is loading the data. To repeat what I said about this
> previously:
> 
> Then I tried to follow this to download the data:
> -----
> from the ntlk.org/install page:
> To install the data, first install NLTK (see
> http://nltk.org/install.html), then use NLTK’s data downloader as
> described below.
> ...
> Reading through the rest of the download options the only one that made
> any sense was:
> Run the command python -m nltk.downloader all
> -----
> 
> But this command just gave a 404 (not found) error.
> 
> How do I get the ntlk data?

It looks like the NLTK project moved from google's defunct code hosting
to github. The python-nltk code has an embedded index location stored
within it, which has not been updated in the version in Ubuntu 14.04 to
take into account the move from google to github.

The current correct index url is

  https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

I don't see a way to specify on the command line to specify a different
index url, but it can be done programmatically. The attached quick
and dirty scriptlet started the download process for me; I killed it
before I let it finish. Save the file and edit the bit that defines
DOWNLOAD_DIR to install the data into your preferred location if you
want and the CORPUS field if you want something other than "all".
Run it by doing "python download-nltk.py".

Good luck.
-- 
Steve Beattie
<[email protected]>
http://NxNW.org/~steve/

Attachment: signature.asc
Description: Digital signature

_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to