Joel, *sorry*, I should probably have mentioned this earlier:



joblib.dump takes a "compress" kwarg, which I used, probably 3 as recommended 
by the docstring, so that I wouldn't have a bajillion files representing my RF. 
So the zipping error makes perfect sense, except that I wouldn't expect gzip to 
change between Python versions. ;) I haven't tried using compress=0, but would 
like to avoid that if possible! (these test RFs are in my repo.)




I'm on a different computer right now so will submit pickle traceback later... 
But hoping there's a good joblib-based solution! =)




Juan.

On Fri, Jan 23, 2015 at 1:38 PM, Joel Nothman <joel.noth...@gmail.com>
wrote:

> Could you provide the traceback when using pickle? The joblib error is
> about zipping, which should not be applicable there...
> On 23 January 2015 at 13:30, Juan Nunez-Iglesias <jni.s...@gmail.com> wrote:
>> Nope, the Py2 RF was saved with joblib!
>>
>> The SO response might work for standard pickling though, I'll give that a
>> try, thanks!
>>
>>
>>
>>
>> On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka <se.rasc...@gmail.com>
>> wrote:
>>
>>> Sorry, I think my previous message was a little bit ambiguous.
>>>
>>> What I would try is:
>>>
>>> 1) Unpickle the original pickle file in Python 2
>>> 2) Pickle it via joblib
>>> 3) Load it in Python 3
>>>
>>> (I think you only did step 3), right? Sorry for the confusion).
>>>
>>> I also just saw a related SO post that might be very helpful:
>>> http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>> On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
>>>
>>> Hi Sebastian,
>>>
>>> Thanks for the response, but actually joblib doesn't work either:
>>>
>>>  In [1]: from sklearn.externals import joblib
>>>
>>> In [2]: rf = joblib.load('rf-1.joblib')
>>>
>>> ---------------------------------------------------------------------------
>>> error                                     Traceback (most recent call
>>> last)
>>> <ipython-input-3-2c47f0ec1d5b> in <module>()
>>> ----> 1 rf = joblib.load('rf-1.joblib')
>>>
>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>> in load(filename, mmap_mode)
>>>     417                               'ignoring mmap_mode "%(mmap_mode)s"
>>> flag passed'
>>>     418                               % locals(), Warning, stacklevel=2)
>>> --> 419             unpickler = ZipNumpyUnpickler(filename,
>>> file_handle=file_handle)
>>>     420         else:
>>>     421             unpickler = NumpyUnpickler(filename,
>>> file_handle=file_handle,
>>>
>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>> in __init__(self, filename, file_handle)
>>>     306         NumpyUnpickler.__init__(self, filename,
>>>     307                                 file_handle,
>>> --> 308                                 mmap_mode=None)
>>>     309
>>>     310     def _open_pickle(self, file_handle):
>>>
>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>> in __init__(self, filename, file_handle, mmap_mode)
>>>     264         self._dirname = os.path.dirname(filename)
>>>     265         self.mmap_mode = mmap_mode
>>> --> 266         self.file_handle = self._open_pickle(file_handle)
>>>     267         Unpickler.__init__(self, self.file_handle)
>>>     268         try:
>>>
>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>> in _open_pickle(self, file_handle)
>>>     309
>>>     310     def _open_pickle(self, file_handle):
>>> --> 311         return BytesIO(read_zfile(file_handle))
>>>     312
>>>     313
>>>
>>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>>> in read_zfile(file_handle)
>>>      66     # We use the known length of the data to tell Zlib the size
>>> of the
>>>      67     # buffer to allocate.
>>> ---> 68     data = zlib.decompress(file_handle.read(), 15, length)
>>>      69     assert len(data) == length, (
>>>      70         "Incorrect data length while decompressing %s."
>>>
>>> error: Error -3 while decompressing data: incorrect header check
>>>
>>>
>>> The very same commands work fine in Py2:
>>>
>>>  In [1]: from sklearn.externals import joblib
>>>
>>> In [2]: rf1 = joblib.load('rf-1.joblib')
>>>
>>> In [3]:
>>>
>>>
>>> Is this unexpected?
>>>
>>>
>>>
>>>
>>> On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <se.rasc...@gmail.com>
>>> wrote:
>>>
>>>> Hi, Juan,
>>>>
>>>> It's been some time, but I remember that I had similar issues. I think
>>>> it has to do with the numpy arrays that specifically cause problems in
>>>> pickle. (http://bugs.python.org/issue6784)
>>>>
>>>> You could try to use joblib (which should also be more efficient):
>>>>
>>>> >>> from sklearn.externals import joblib
>>>> >>> joblib.dump(clf, 'filename.pkl')
>>>> >>> clf = joblib.load('filename.pkl')
>>>>
>>>> (http://scikit-learn.org/stable/modules/model_persistence.html)
>>>>
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>> > On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > I'm working on a project that depends on sklearn. I've been up test
>>>> coverage (which includes saving a RandomForest, so far using joblib
>>>> serialization), and now I wanted to make the project Python 3-compatible.
>>>> However, the final roadblock is the sharing of RF objects: I can't load the
>>>> Python 2-serialized RFs with Python 3 tests. Of course, the test outcome
>>>> depends on the exact RF that was created a while back. Is there any way
>>>> around this?
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Juan.
>>>> >
>>>> >
>>>> >
>>>> ------------------------------------------------------------------------------
>>>>
>>>> > New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>>> > GigeNET is offering a free month of service with a new server in
>>>> Ashburn.
>>>> > Choose from 2 high performing configs, both with 100TB of bandwidth.
>>>> > Higher redundancy.Lower latency.Increased capacity.Completely
>>>> compliant.
>>>> >
>>>> http://p.sf.net/sfu/gigenet_______________________________________________
>>>> > Scikit-learn-general mailing list
>>>> > Scikit-learn-general@lists.sourceforge.net
>>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>>> GigeNET is offering a free month of service with a new server in
>>>> Ashburn.
>>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>>>> http://p.sf.net/sfu/gigenet
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>> GigeNET is offering a free month of service with a new server in Ashburn.
>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>>> http://p.sf.net/sfu/gigenet_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>> GigeNET is offering a free month of service with a new server in Ashburn.
>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>> http://p.sf.net/sfu/gigenet
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to