I am running python 2.7.3, using enthought canopy and am having issues with
fetching the twenty news groups dataset. It says empty file when I try using
the code provided in the example on the following page:
http://scikit-learn.org/stable/datasets/twenty_newsgroups.html
The first two lines of the example don't work and it throws the following
error: ReadError: empty file
It has empty input from the twentynewsgroups as far as I can tell but I have no
idea how to fix it.
Any help would be appreciated.
Thanks,
Nik
Error Trace:
1 from sklearn.datasets import fetch_20newsgroups
----> 2 newsgroups_train = fetch_20newsgroups(subset='train')
Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.pyc
in fetch_20newsgroups(data_home, subset, categories, shuffle, random_state,
remove, download_if_missing)
205 if download_if_missing:
206 cache = download_20newsgroups(target_dir=twenty_home,
--> 207 cache_path=cache_path)
208 else:
209 raise IOError('20Newsgroups dataset not found')
Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/sklearn/datasets/twenty_newsgroups.pyc
in download_20newsgroups(target_dir, cache_path)
87
88 logger.info("Decompressing %s", archive_path)
---> 89 tarfile.open(archive_path, "r:gz").extractall(path=target_dir)
90 os.remove(archive_path)
91
/Applications/Canopy.app/appdata/canopy-1.0.3.1262.macosx-x86_64/Canopy.app/Contents/lib/python2.7/tarfile.pyc
in open(cls, name, mode, fileobj, bufsize, **kwargs)
1676 else:
1677 raise CompressionError("unknown compression type %r" % comptype)
-> 1678 return func(name, filemode, fileobj, **kwargs)
1679
1680 elif "|" in mode:
/Applications/Canopy.app/appdata/canopy-1.0.3.1262.macosx-x86_64/Canopy.app/Contents/lib/python2.7/tarfile.pyc
in gzopen(cls, name, mode, fileobj, compresslevel, **kwargs)
1725 t = cls.taropen(name, mode,
1726 gzip.GzipFile(name, mode, compresslevel, fileobj),
-> 1727 **kwargs)
1728 except IOError:
1729 raise ReadError("not a gzip file")
/Applications/Canopy.app/appdata/canopy-1.0.3.1262.macosx-x86_64/Canopy.app/Contents/lib/python2.7/tarfile.pyc
in taropen(cls, name, mode, fileobj, **kwargs)
1703 if len(mode) > 1 or mode not in "raw":
1704 raise ValueError("mode must be 'r', 'a' or 'w'")
-> 1705 return cls(name, mode, fileobj, **kwargs)
1706
1707 @classmethod<https://github.com/classmethod>
/Applications/Canopy.app/appdata/canopy-1.0.3.1262.macosx-x86_64/Canopy.app/Contents/lib/python2.7/tarfile.pyc
ininit(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros,
encoding, errors, pax_headers, debug, errorlevel)
1572 if self.mode == "r":
1573 self.firstmember = None
-> 1574 self.firstmember = self.next()
1575
1576 if self.mode == "a":
/Applications/Canopy.app/appdata/canopy-1.0.3.1262.macosx-x86_64/Canopy.app/Contents/lib/python2.7/tarfile.pyc
in next(self)
2332 except EmptyHeaderError:
2333 if self.offset == 0:
-> 2334 raise ReadError("empty file")
2335 except TruncatedHeaderError, e:
2336 if self.offset == 0:
ReadError: empty file
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general