On Wed, Nov 18, 2015 at 10:41:48PM -0800, Steve Beattie wrote: > On Wed, Nov 18, 2015 at 07:22:08PM -0800, John Jason Jordan wrote: > > Thanks to you and the others who responded. > > > > Evidently I left out the fact that I found python-ntlk in Synaptic > > package manager and installed it without error. That part is done. My > > problem is loading the data. To repeat what I said about this > > previously: > > > > Then I tried to follow this to download the data: > > ----- > > from the ntlk.org/install page: > > To install the data, first install NLTK (see > > http://nltk.org/install.html), then use NLTK’s data downloader as > > described below. > > ... > > Reading through the rest of the download options the only one that made > > any sense was: > > Run the command python -m nltk.downloader all > > ----- > > > > But this command just gave a 404 (not found) error. > > > > How do I get the ntlk data? > > It looks like the NLTK project moved from google's defunct code hosting > to github. The python-nltk code has an embedded index location stored > within it, which has not been updated in the version in Ubuntu 14.04 to > take into account the move from google to github. > > The current correct index url is > > https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml > > I don't see a way to specify on the command line to specify a different > index url, but it can be done programmatically. The attached quick > and dirty scriptlet started the download process for me; I killed it > before I let it finish. Save the file and edit the bit that defines > DOWNLOAD_DIR to install the data into your preferred location if you > want and the CORPUS field if you want something other than "all". > Run it by doing "python download-nltk.py".
Bah, forgot the list strips attachments. Here it is inline:
#!/usr/bin/python
import nltk
import os
# adjust the following path to wherever you prefer, by default it
# installs in ${HOME}/nltk_data/
# e.g. change it to DOWNLOAD_DIR='/tmp/nltk'
# note that surrounding the path in quotes is important
DOWNLOAD_DIR=None
# adjust below if you want to download something other than 'all'
CORPUS='all'
if DOWNLOAD_DIR and not os.path.exists(DOWNLOAD_DIR):
os.mkdir(DOWNLOAD_DIR)
d =
nltk.downloader.Downloader(server_index_url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml",
download_dir=DOWNLOAD_DIR)
d.download(CORPUS)
--
Steve Beattie
<[email protected]>
http://NxNW.org/~steve/
signature.asc
Description: Digital signature
_______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
