Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

jni . soma Fri, 23 Jan 2015 04:41:50 -0800

Hi everyone,




Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is not 
portable?




Joel, here are the tracebacks from standard Python pickles of increasing 
protocols (0, 1, 2), saved in Python 2 and attempting to load them in Python 3:








---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-6-100e36105a73> in <module>()

      1 with open('rf-1.pck', 'r') as fin:

----> 2     rf1 = pck.load(fin)

      3




TypeError: 'str' does not support the buffer interface










---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-7-100e36105a73> in <module>()

      1 with open('rf-1.pck', 'r') as fin:

----> 2     rf1 = pck.load(fin)

      3




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in 
decode(self, input, final)

    311         # decode input (taking the buffer into account)

    312         data = self.buffer + input

--> 313         (result, consumed) = self._buffer_decode(data, self.errors, 
final)

    314         # keep undecoded input until the next call

    315         self.buffer = data[consumed:]




UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595: 
invalid start byte










---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-5-100e36105a73> in <module>()

      1 with open('rf-1.pck', 'r') as fin:

----> 2     rf1 = pck.load(fin)

      3




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in 
decode(self, input, final)

    311         # decode input (taking the buffer into account)

    312         data = self.buffer + input

--> 313         (result, consumed) = self._buffer_decode(data, self.errors, 
final)

    314         # keep undecoded input until the next call

    315         self.buffer = data[consumed:]




UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid 
start byte







Thanks again everyone!








On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias <[email protected]>, 
wrote:

Joel, *sorry*, I should probably have mentioned this earlier:




joblib.dump takes a "compress" kwarg, which I used, probably 3 as recommended 
by the docstring, so that I wouldn't have a bajillion files representing my RF. 
So the zipping error makes perfect sense, except that I wouldn't expect gzip to 
change between Python versions. ;) I haven't tried using compress=0, but would 
like to avoid that if possible! (these test RFs are in my repo.)




I'm on a different computer right now so will submit pickle traceback later... 
But hoping there's a good joblib-based solution! =)




Juan.








On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman <[email protected]> wrote:



Could you provide the traceback when using pickle? The joblib error is about 
zipping, which should not be applicable there...



On 23 January 2015 at 13:30, Juan Nunez-Iglesias <[email protected]> wrote:

Nope, the Py2 RF was saved with joblib!




The SO response might work for standard pickling though, I'll give that a try, 
thanks!









On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka <[email protected]> 
wrote:



Sorry, I think my previous message was a little bit ambiguous.




What I would try is:




1) Unpickle the original pickle file in Python 2

2) Pickle it via joblib

3) Load it in Python 3




(I think you only did step 3), right? Sorry for the confusion).




I also just saw a related SO post that might be very helpful: 
http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3





Best,

Sebastian







On Jan 22, 2015, at 5:10 PM, [email protected] wrote:



Hi Sebastian,




Thanks for the response, but actually joblib doesn't work either:





In [1]: from sklearn.externals import joblib




In [2]: rf = joblib.load('rf-1.joblib')

---------------------------------------------------------------------------

error                                     Traceback (most recent call last)

<ipython-input-3-2c47f0ec1d5b> in <module>()

----> 1 rf = joblib.load('rf-1.joblib')




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in load(filename, mmap_mode)

    417                               'ignoring mmap_mode "%(mmap_mode)s" flag 
passed'

    418                               % locals(), Warning, stacklevel=2)

--> 419             unpickler = ZipNumpyUnpickler(filename, 
file_handle=file_handle)

    420         else:

    421             unpickler = NumpyUnpickler(filename, 
file_handle=file_handle,




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle)

    306         NumpyUnpickler.__init__(self, filename,

    307                                 file_handle,

--> 308                                 mmap_mode=None)

    309

    310     def _open_pickle(self, file_handle):




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in __init__(self, filename, file_handle, mmap_mode)

    264         self._dirname = os.path.dirname(filename)

    265         self.mmap_mode = mmap_mode

--> 266         self.file_handle = self._open_pickle(file_handle)

    267         Unpickler.__init__(self, self.file_handle)

    268         try:




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in _open_pickle(self, file_handle)

    309

    310     def _open_pickle(self, file_handle):

--> 311         return BytesIO(read_zfile(file_handle))

    312

    313




/Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
 in read_zfile(file_handle)

     66     # We use the known length of the data to tell Zlib the size of the

     67     # buffer to allocate.

---> 68     data = zlib.decompress(file_handle.read(), 15, length)

     69     assert len(data) == length, (

     70         "Incorrect data length while decompressing %s."




error: Error -3 while decompressing data: incorrect header check







The very same commands work fine in Py2:





In [1]: from sklearn.externals import joblib




In [2]: rf1 = joblib.load('rf-1.joblib')




In [3]:







Is this unexpected?












On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <[email protected]> wrote:


Hi, Juan,


It's been some time, but I  remember that I had similar issues. I think it has 
to do with the numpy arrays that specifically cause problems in pickle. 
(http://bugs.python.org/issue6784)


You could try to use joblib (which should also be more efficient):


>>> from sklearn.externals import joblib

>>> joblib.dump(clf, 'filename.pkl')

>>> clf = joblib.load('filename.pkl') 


(http://scikit-learn.org/stable/modules/model_persistence.html) 



Best,

Sebastian


> On Jan 22, 2015, at 8:50 AM, [email protected] wrote:

> 

> Hi all,

> 

> I'm working on a project that depends on sklearn. I've been up test coverage 
> (which includes saving a RandomForest, so far using joblib serialization), 
> and now I wanted to make the project Python 3-compatible. However, the final 
> roadblock is the sharing of RF objects: I can't load the Python 2-serialized 
> RFs with Python 3 tests. Of course, the test outcome depends on the exact RF 
> that was created a while back. Is there any way around this?

> 

> Thanks!

> 

> Juan.

> 

> 

> ------------------------------------------------------------------------------

> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.

> GigeNET is offering a free month of service with a new server in Ashburn.

> Choose from 2 high performing configs, both with 100TB of bandwidth.

> Higher redundancy.Lower latency.Increased capacity.Completely compliant.

> http://p.sf.net/sfu/gigenet_______________________________________________

> Scikit-learn-general mailing list

> [email protected]

> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------

New Year. New Location. New Benefits. New Data Center in Ashburn, VA.

GigeNET is offering a free month of service with a new server in Ashburn.

Choose from 2 high performing configs, both with 100TB of bandwidth.

Higher redundancy.Lower latency.Increased capacity.Completely compliant.

http://p.sf.net/sfu/gigenet

_______________________________________________

Scikit-learn-general mailing list

[email protected]

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general






------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
















------------------------------------------------------------------------------

New Year. New Location. New Benefits. New Data Center in Ashburn, VA.

GigeNET is offering a free month of service with a new server in Ashburn.

Choose from 2 high performing configs, both with 100TB of bandwidth.

Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________

Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

Reply via email to