Hello,
I had been trying to dump a compressed joblib file (which was working fine
about a month ago). Previously I had an issue with amount of memory that
joblib compression took and it seemed that zlib was the issue. But I got more
memory to satisfy the problem.
However when I tried it to do the same today I get an error on
decompressing. Is this an seen issue with joblib? (I haven't changed my code
and was on vacation for a month). I upgraded the scikit to 0.13 and still see
the issue. Following basically demonstrates the steps in my code: loading an
uncompressed classifier object dumped with joblib, compressing it and dumping
the new compressed classifier.
$ ipython
Python 2.6.6 (r266:84292, Sep 11 2012, 08:34:23)
Type "copyright", "credits" or "license" for more information.
IPython 0.13 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from sklearn.externals import joblib
In [2]: clf=joblib.load("classifier.joblib") #Load uncompressed classifier
In [3]: clf
Out[3]:
SGDClassifier(alpha=1e-05, class_weight=None, epsilon=0.1, eta0=0.0,
fit_intercept=True, l1_ratio=0.15, learning_rate='optimal',
loss='log', n_iter=35, n_jobs=1, penalty='l2', power_t=0.5,
random_state=None, rho=None, shuffle=False, verbose=0,
warm_start=False)
In [4]: joblib.dump(clf, "compressedclassifier.joblib", compress=9)
Out[4]: ['compressedclassifier.joblib', 'compressedclassifier.joblib_01.npy.z']
In [5]: clf=joblib.load("compressedclassifier.joblib")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-5-ad6b23335871> in <module>()
----> 1 clf=joblib.load("compressedclassifier.joblib")
/home/n7/newenv/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in load(filename, mmap_mode)
422
423 try:
--> 424 obj = unpickler.load()
425 finally:
426 if hasattr(unpickler, 'file_handle'):
/usr/lib64/python2.6/pickle.pyc in load(self)
856 while 1:
857 key = read(1)
--> 858 dispatch[key](self)
859 except _Stop, stopinst:
860 return stopinst.value
/home/n7/newenv/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in load_build(self)
291 "but numpy didn't import correctly")
292 nd_array_wrapper = self.stack.pop()
--> 293 array = nd_array_wrapper.read(self)
294 self.stack.append(array)
295
/home/n7/newenv/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in read(self, unpickler)
157 filename = os.path.join(unpickler._dirname, self.filename)
158 array =
unpickler.np.core.multiarray._reconstruct(*self.init_args)
--> 159 data = read_zfile(open(filename, 'rb'))
160 state = self.state + (data,)
161 array.__setstate__(state)
/home/n7/newenv/lib/python2.6/site-
packages/sklearn/externals/joblib/numpy_pickle.pyc in read_zfile(file_handle)
69 assert len(data) == length, (
70 "Incorrect data length while decompressing %s."
---> 71 "The file could be corrupted." % file_handle)
72 return data
73
AssertionError: Incorrect data length while decompressing <open file
'compressedclassifier.joblib_01.npy.z', mode 'rb' at 0x2d5b5d0>.The file could
be corrupted.
In [6]:
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general