Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

Sebastian Raschka Thu, 22 Jan 2015 16:18:46 -0800

Sorry, I think my previous message was a little bit ambiguous.

What I would try is:


1) Unpickle the original pickle file in Python 2
2) Pickle it via joblib
3) Load it in Python 3

(I think you only did step 3), right? Sorry for the confusion).

I also just saw a related SO post that might be very helpful: 
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
 
<http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3>

Best,
Sebastian


> On Jan 22, 2015, at 5:10 PM, [email protected] wrote:
> 
> Hi Sebastian,
> 
> Thanks for the response, but actually joblib doesn't work either:
> 
> In [1]: from sklearn.externals import joblib
> 
> In [2]: rf = joblib.load('rf-1.joblib')
> ---------------------------------------------------------------------------
> error                                     Traceback (most recent call last)
> <ipython-input-3-2c47f0ec1d5b> in <module>()
> ----> 1 rf = joblib.load('rf-1.joblib')
> 
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>  in load(filename, mmap_mode)
>     417                               'ignoring mmap_mode "%(mmap_mode)s" 
> flag passed'
>     418                               % locals(), Warning, stacklevel=2)
> --> 419             unpickler = ZipNumpyUnpickler(filename, 
> file_handle=file_handle)
>     420         else:
>     421             unpickler = NumpyUnpickler(filename, 
> file_handle=file_handle,
> 
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>  in __init__(self, filename, file_handle)
>     306         NumpyUnpickler.__init__(self, filename,
>     307                                 file_handle,
> --> 308                                 mmap_mode=None)
>     309
>     310     def _open_pickle(self, file_handle):
> 
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>  in __init__(self, filename, file_handle, mmap_mode)
>     264         self._dirname = os.path.dirname(filename)
>     265         self.mmap_mode = mmap_mode
> --> 266         self.file_handle = self._open_pickle(file_handle)
>     267         Unpickler.__init__(self, self.file_handle)
>     268         try:
> 
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>  in _open_pickle(self, file_handle)
>     309
>     310     def _open_pickle(self, file_handle):
> --> 311         return BytesIO(read_zfile(file_handle))
>     312
>     313
> 
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>  in read_zfile(file_handle)
>      66     # We use the known length of the data to tell Zlib the size of the
>      67     # buffer to allocate.
> ---> 68     data = zlib.decompress(file_handle.read(), 15, length)
>      69     assert len(data) == length, (
>      70         "Incorrect data length while decompressing %s."
> 
> error: Error -3 while decompressing data: incorrect header check
> 
> 
> The very same commands work fine in Py2:
> 
> In [1]: from sklearn.externals import joblib
> 
> In [2]: rf1 = joblib.load('rf-1.joblib')
> 
> In [3]:
> 
> 
> Is this unexpected?
> 
> 
> 
> 
> On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> Hi, Juan, 
> 
> It's been some time, but I remember that I had similar issues. I think it has 
> to do with the numpy arrays that specifically cause problems in pickle. 
> (http://bugs.python.org/issue6784) 
> 
> You could try to use joblib (which should also be more efficient): 
> 
> >>> from sklearn.externals import joblib 
> >>> joblib.dump(clf, 'filename.pkl') 
> >>> clf = joblib.load('filename.pkl') 
> 
> (http://scikit-learn.org/stable/modules/model_persistence.html)       
> 
> 
> Best, 
> Sebastian 
> 
> > On Jan 22, 2015, at 8:50 AM, [email protected] wrote: 
> > 
> > Hi all, 
> > 
> > I'm working on a project that depends on sklearn. I've been up test 
> > coverage (which includes saving a RandomForest, so far using joblib 
> > serialization), and now I wanted to make the project Python 3-compatible. 
> > However, the final roadblock is the sharing of RF objects: I can't load the 
> > Python 2-serialized RFs with Python 3 tests. Of course, the test outcome 
> > depends on the exact RF that was created a while back. Is there any way 
> > around this? 
> > 
> > Thanks! 
> > 
> > Juan. 
> > 
> > 
> > ------------------------------------------------------------------------------
> >  
> > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. 
> > GigeNET is offering a free month of service with a new server in Ashburn. 
> > Choose from 2 high performing configs, both with 100TB of bandwidth. 
> > Higher redundancy.Lower latency.Increased capacity.Completely compliant. 
> > http://p.sf.net/sfu/gigenet_______________________________________________ 
> > Scikit-learn-general mailing list 
> > [email protected] 
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> 
> 
> ------------------------------------------------------------------------------
>  
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA. 
> GigeNET is offering a free month of service with a new server in Ashburn. 
> Choose from 2 high performing configs, both with 100TB of bandwidth. 
> Higher redundancy.Lower latency.Increased capacity.Completely compliant. 
> http://p.sf.net/sfu/gigenet 
> _______________________________________________ 
> Scikit-learn-general mailing list 
> [email protected] 
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> 
> 
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

Reply via email to