Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

Joel Nothman Sat, 24 Jan 2015 02:27:08 -0800

They all sound related to the Py3k handling of Unicode, in which case I'm
guessing a search should find cases of this issue elsewhere. I'm glad
joblib worked in the end, but maybe it's worth leaving an issue on the
joblib project so that it could be appropriately tested or documented.


On 23 January 2015 at 23:40, <jni.s...@gmail.com> wrote:

>  Hi everyone,
>
> Using joblib with compress=0 worked! Is it a joblib bug that compress=3 is
> not portable?
>
> Joel, here are the tracebacks from standard Python pickles of increasing
> protocols (0, 1, 2), saved in Python 2 and attempting to load them in
> Python 3:
>
>
>
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> <ipython-input-6-100e36105a73> in <module>()
>       1 with open('rf-1.pck', 'r') as fin:
> ----> 2     rf1 = pck.load(fin)
>       3
>
> TypeError: 'str' does not support the buffer interface
>
>
>
> ---------------------------------------------------------------------------
> UnicodeDecodeError                        Traceback (most recent call last)
> <ipython-input-7-100e36105a73> in <module>()
>       1 with open('rf-1.pck', 'r') as fin:
> ----> 2     rf1 = pck.load(fin)
>       3
>
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
> decode(self, input, final)
>     311         # decode input (taking the buffer into account)
>     312         data = self.buffer + input
> --> 313         (result, consumed) = self._buffer_decode(data,
> self.errors, final)
>     314         # keep undecoded input until the next call
>     315         self.buffer = data[consumed:]
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 595:
> invalid start byte
>
>
>
> ---------------------------------------------------------------------------
> UnicodeDecodeError                        Traceback (most recent call last)
> <ipython-input-5-100e36105a73> in <module>()
>       1 with open('rf-1.pck', 'r') as fin:
> ----> 2     rf1 = pck.load(fin)
>       3
>
> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/codecs.py in
> decode(self, input, final)
>     311         # decode input (taking the buffer into account)
>     312         data = self.buffer + input
> --> 313         (result, consumed) = self._buffer_decode(data,
> self.errors, final)
>     314         # keep undecoded input until the next call
>     315         self.buffer = data[consumed:]
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
> invalid start byte
>
>
> Thanks again everyone!
>
> On Friday, Jan 23, 2015 at 1:49 pm, Juan Nunez-Iglesias <
> jni.s...@gmail.com>, wrote:
>
>> Joel, *sorry*, I should probably have mentioned this earlier:
>>
>> joblib.dump takes a "compress" kwarg, which I used, probably 3 as
>> recommended by the docstring, so that I wouldn't have a bajillion files
>> representing my RF. So the zipping error makes perfect sense, except that I
>> wouldn't expect gzip to change between Python versions. ;) I haven't tried
>> using compress=0, but would like to avoid that if possible! (these test RFs
>> are in my repo.)
>>
>> I'm on a different computer right now so will submit pickle traceback
>> later... But hoping there's a good joblib-based solution! =)
>>
>> Juan.
>>
>>
>>
>>
>> On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman <joel.noth...@gmail.com>
>> wrote:
>>
>>> Could you provide the traceback when using pickle? The joblib error is
>>> about zipping, which should not be applicable there...
>>>
>>> On 23 January 2015 at 13:30, Juan Nunez-Iglesias <jni.s...@gmail.com>
>>> wrote:
>>>
>>>> Nope, the Py2 RF was saved with joblib!
>>>>
>>>> The SO response might work for standard pickling though, I'll give that
>>>> a try, thanks!
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka <
>>>> se.rasc...@gmail.com> wrote:
>>>>
>>>>> Sorry, I think my previous message was a little bit ambiguous.
>>>>>
>>>>> What I would try is:
>>>>>
>>>>> 1) Unpickle the original pickle file in Python 2
>>>>> 2) Pickle it via joblib
>>>>> 3) Load it in Python 3
>>>>>
>>>>> (I think you only did step 3), right? Sorry for the confusion).
>>>>>
>>>>> I also just saw a related SO post that might be very helpful:
>>>>> http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
>>>>>
>>>>> Hi Sebastian,
>>>>>
>>>>> Thanks for the response, but actually joblib doesn't work either:
>>>>>
>>>>>  In [1]: from sklearn.externals import joblib
>>>>>
>>>>> In [2]: rf = joblib.load('rf-1.joblib')
>>>>>
>>>>> ---------------------------------------------------------------------------
>>>>> error                                     Traceback (most recent call
>>>>> last)
>>>>> <ipython-input-3-2c47f0ec1d5b> in <module>()
>>>>> ----> 1 rf = joblib.load('rf-1.joblib')
>>>>>
>>>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>>>> in load(filename, mmap_mode)
>>>>>     417                               'ignoring mmap_mode
>>>>> "%(mmap_mode)s" flag passed'
>>>>>     418                               % locals(), Warning,
>>>>> stacklevel=2)
>>>>> --> 419             unpickler = ZipNumpyUnpickler(filename,
>>>>> file_handle=file_handle)
>>>>>     420         else:
>>>>>     421             unpickler = NumpyUnpickler(filename,
>>>>> file_handle=file_handle,
>>>>>
>>>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>>>> in __init__(self, filename, file_handle)
>>>>>     306         NumpyUnpickler.__init__(self, filename,
>>>>>     307                                 file_handle,
>>>>> --> 308                                 mmap_mode=None)
>>>>>     309
>>>>>     310     def _open_pickle(self, file_handle):
>>>>>
>>>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>>>> in __init__(self, filename, file_handle, mmap_mode)
>>>>>     264         self._dirname = os.path.dirname(filename)
>>>>>     265         self.mmap_mode = mmap_mode
>>>>> --> 266         self.file_handle = self._open_pickle(file_handle)
>>>>>     267         Unpickler.__init__(self, self.file_handle)
>>>>>     268         try:
>>>>>
>>>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>>>> in _open_pickle(self, file_handle)
>>>>>     309
>>>>>     310     def _open_pickle(self, file_handle):
>>>>> --> 311         return BytesIO(read_zfile(file_handle))
>>>>>     312
>>>>>     313
>>>>>
>>>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>>>> in read_zfile(file_handle)
>>>>>      66     # We use the known length of the data to tell Zlib the
>>>>> size of the
>>>>>      67     # buffer to allocate.
>>>>> ---> 68     data = zlib.decompress(file_handle.read(), 15, length)
>>>>>      69     assert len(data) == length, (
>>>>>      70         "Incorrect data length while decompressing %s."
>>>>>
>>>>> error: Error -3 while decompressing data: incorrect header check
>>>>>
>>>>>
>>>>> The very same commands work fine in Py2:
>>>>>
>>>>>  In [1]: from sklearn.externals import joblib
>>>>>
>>>>> In [2]: rf1 = joblib.load('rf-1.joblib')
>>>>>
>>>>> In [3]:
>>>>>
>>>>>
>>>>> Is this unexpected?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <
>>>>> se.rasc...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Juan,
>>>>>>
>>>>>> It's been some time, but I remember that I had similar issues. I
>>>>>> think it has to do with the numpy arrays that specifically cause problems
>>>>>> in pickle. (http://bugs.python.org/issue6784)
>>>>>>
>>>>>> You could try to use joblib (which should also be more efficient):
>>>>>>
>>>>>> >>> from sklearn.externals import joblib
>>>>>> >>> joblib.dump(clf, 'filename.pkl')
>>>>>> >>> clf = joblib.load('filename.pkl')
>>>>>>
>>>>>> (http://scikit-learn.org/stable/modules/model_persistence.html)
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Sebastian
>>>>>>
>>>>>> > On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
>>>>>> >
>>>>>> > Hi all,
>>>>>> >
>>>>>> > I'm working on a project that depends on sklearn. I've been up test
>>>>>> coverage (which includes saving a RandomForest, so far using joblib
>>>>>> serialization), and now I wanted to make the project Python 3-compatible.
>>>>>> However, the final roadblock is the sharing of RF objects: I can't load 
>>>>>> the
>>>>>> Python 2-serialized RFs with Python 3 tests. Of course, the test outcome
>>>>>> depends on the exact RF that was created a while back. Is there any way
>>>>>> around this?
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> > Juan.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> > New Year. New Location. New Benefits. New Data Center in Ashburn,
>>>>>> VA.
>>>>>> > GigeNET is offering a free month of service with a new server in
>>>>>> Ashburn.
>>>>>> > Choose from 2 high performing configs, both with 100TB of
>>>>>> bandwidth.
>>>>>> > Higher redundancy.Lower latency.Increased capacity.Completely
>>>>>> compliant.
>>>>>> >
>>>>>> http://p.sf.net/sfu/gigenet_______________________________________________
>>>>>> > Scikit-learn-general mailing list
>>>>>> > Scikit-learn-general@lists.sourceforge.net
>>>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>>
>>>>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>>>>> GigeNET is offering a free month of service with a new server in
>>>>>> Ashburn.
>>>>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>>>>> Higher redundancy.Lower latency.Increased capacity.Completely
>>>>>> compliant.
>>>>>> http://p.sf.net/sfu/gigenet
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>>>> GigeNET is offering a free month of service with a new server in
>>>>> Ashburn.
>>>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>>>> Higher redundancy.Lower latency.Increased capacity.Completely
>>>>> compliant.
>>>>>
>>>>> http://p.sf.net/sfu/gigenet_______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>>> GigeNET is offering a free month of service with a new server in
>>>> Ashburn.
>>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>>>> http://p.sf.net/sfu/gigenet
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sharing objects between Python 2 and 3

Reply via email to