Could you provide the traceback when using pickle? The joblib error is
about zipping, which should not be applicable there...
On 23 January 2015 at 13:30, Juan Nunez-Iglesias <jni.s...@gmail.com> wrote:
> Nope, the Py2 RF was saved with joblib!
>
> The SO response might work for standard pickling though, I'll give that a
> try, thanks!
>
>
>
>
> On Fri, Jan 23, 2015 at 11:18 AM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> Sorry, I think my previous message was a little bit ambiguous.
>>
>> What I would try is:
>>
>> 1) Unpickle the original pickle file in Python 2
>> 2) Pickle it via joblib
>> 3) Load it in Python 3
>>
>> (I think you only did step 3), right? Sorry for the confusion).
>>
>> I also just saw a related SO post that might be very helpful:
>> http://stackoverflow.com/questions/11305790/pickle-incompatability-of-numpy-arrays-between-python-2-and-3
>>
>> Best,
>> Sebastian
>>
>>
>> On Jan 22, 2015, at 5:10 PM, jni.s...@gmail.com wrote:
>>
>> Hi Sebastian,
>>
>> Thanks for the response, but actually joblib doesn't work either:
>>
>> In [1]: from sklearn.externals import joblib
>>
>> In [2]: rf = joblib.load('rf-1.joblib')
>>
>> ---------------------------------------------------------------------------
>> error Traceback (most recent call
>> last)
>> <ipython-input-3-2c47f0ec1d5b> in <module>()
>> ----> 1 rf = joblib.load('rf-1.joblib')
>>
>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>> in load(filename, mmap_mode)
>> 417 'ignoring mmap_mode "%(mmap_mode)s"
>> flag passed'
>> 418 % locals(), Warning, stacklevel=2)
>> --> 419 unpickler = ZipNumpyUnpickler(filename,
>> file_handle=file_handle)
>> 420 else:
>> 421 unpickler = NumpyUnpickler(filename,
>> file_handle=file_handle,
>>
>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>> in __init__(self, filename, file_handle)
>> 306 NumpyUnpickler.__init__(self, filename,
>> 307 file_handle,
>> --> 308 mmap_mode=None)
>> 309
>> 310 def _open_pickle(self, file_handle):
>>
>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>> in __init__(self, filename, file_handle, mmap_mode)
>> 264 self._dirname = os.path.dirname(filename)
>> 265 self.mmap_mode = mmap_mode
>> --> 266 self.file_handle = self._open_pickle(file_handle)
>> 267 Unpickler.__init__(self, self.file_handle)
>> 268 try:
>>
>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>> in _open_pickle(self, file_handle)
>> 309
>> 310 def _open_pickle(self, file_handle):
>> --> 311 return BytesIO(read_zfile(file_handle))
>> 312
>> 313
>>
>> /Users/nuneziglesiasj/anaconda/envs/py3k-gala/lib/python3.3/site-packages/sklearn/externals/joblib/numpy_pickle.py
>> in read_zfile(file_handle)
>> 66 # We use the known length of the data to tell Zlib the size
>> of the
>> 67 # buffer to allocate.
>> ---> 68 data = zlib.decompress(file_handle.read(), 15, length)
>> 69 assert len(data) == length, (
>> 70 "Incorrect data length while decompressing %s."
>>
>> error: Error -3 while decompressing data: incorrect header check
>>
>>
>> The very same commands work fine in Py2:
>>
>> In [1]: from sklearn.externals import joblib
>>
>> In [2]: rf1 = joblib.load('rf-1.joblib')
>>
>> In [3]:
>>
>>
>> Is this unexpected?
>>
>>
>>
>>
>> On Fri, Jan 23, 2015 at 1:57 AM, Sebastian Raschka <se.rasc...@gmail.com>
>> wrote:
>>
>>> Hi, Juan,
>>>
>>> It's been some time, but I remember that I had similar issues. I think
>>> it has to do with the numpy arrays that specifically cause problems in
>>> pickle. (http://bugs.python.org/issue6784)
>>>
>>> You could try to use joblib (which should also be more efficient):
>>>
>>> >>> from sklearn.externals import joblib
>>> >>> joblib.dump(clf, 'filename.pkl')
>>> >>> clf = joblib.load('filename.pkl')
>>>
>>> (http://scikit-learn.org/stable/modules/model_persistence.html)
>>>
>>>
>>> Best,
>>> Sebastian
>>>
>>> > On Jan 22, 2015, at 8:50 AM, jni.s...@gmail.com wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I'm working on a project that depends on sklearn. I've been up test
>>> coverage (which includes saving a RandomForest, so far using joblib
>>> serialization), and now I wanted to make the project Python 3-compatible.
>>> However, the final roadblock is the sharing of RF objects: I can't load the
>>> Python 2-serialized RFs with Python 3 tests. Of course, the test outcome
>>> depends on the exact RF that was created a while back. Is there any way
>>> around this?
>>> >
>>> > Thanks!
>>> >
>>> > Juan.
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>>
>>> > New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>> > GigeNET is offering a free month of service with a new server in
>>> Ashburn.
>>> > Choose from 2 high performing configs, both with 100TB of bandwidth.
>>> > Higher redundancy.Lower latency.Increased capacity.Completely
>>> compliant.
>>> >
>>> http://p.sf.net/sfu/gigenet_______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-general@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>>> GigeNET is offering a free month of service with a new server in
>>> Ashburn.
>>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>>> http://p.sf.net/sfu/gigenet
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>> GigeNET is offering a free month of service with a new server in Ashburn.
>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
>> http://p.sf.net/sfu/gigenet_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general