On Wed, Nov 18, 2015 at 10:41:48PM -0800, Steve Beattie wrote:
> On Wed, Nov 18, 2015 at 07:22:08PM -0800, John Jason Jordan wrote:
> > Thanks to you and the others who responded. 
> > 
> > Evidently I left out the fact that I found python-ntlk in Synaptic
> > package manager and installed it without error. That part is done. My
> > problem is loading the data. To repeat what I said about this
> > previously:
> > 
> > Then I tried to follow this to download the data:
> > -----
> > from the ntlk.org/install page:
> > To install the data, first install NLTK (see
> > http://nltk.org/install.html), then use NLTK’s data downloader as
> > described below.
> > ...
> > Reading through the rest of the download options the only one that made
> > any sense was:
> > Run the command python -m nltk.downloader all
> > -----
> > 
> > But this command just gave a 404 (not found) error.
> > 
> > How do I get the ntlk data?
> 
> It looks like the NLTK project moved from google's defunct code hosting
> to github. The python-nltk code has an embedded index location stored
> within it, which has not been updated in the version in Ubuntu 14.04 to
> take into account the move from google to github.
> 
> The current correct index url is
> 
>   https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
> 
> I don't see a way to specify on the command line to specify a different
> index url, but it can be done programmatically. The attached quick
> and dirty scriptlet started the download process for me; I killed it
> before I let it finish. Save the file and edit the bit that defines
> DOWNLOAD_DIR to install the data into your preferred location if you
> want and the CORPUS field if you want something other than "all".
> Run it by doing "python download-nltk.py".

Bah, forgot the list strips attachments. Here it is inline:


#!/usr/bin/python

import nltk
import os

# adjust the following path to wherever you prefer, by default it
# installs in ${HOME}/nltk_data/
# e.g. change it to DOWNLOAD_DIR='/tmp/nltk'
# note that surrounding the path in quotes is important
DOWNLOAD_DIR=None

# adjust below if you want to download something other than 'all'
CORPUS='all'

if DOWNLOAD_DIR and not os.path.exists(DOWNLOAD_DIR):
    os.mkdir(DOWNLOAD_DIR)

d = 
nltk.downloader.Downloader(server_index_url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml";,
 download_dir=DOWNLOAD_DIR)

d.download(CORPUS)


-- 
Steve Beattie
<[email protected]>
http://NxNW.org/~steve/

Attachment: signature.asc
Description: Digital signature

_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to